Securing Behavior-based Opinion Spam Detection

Shuaijun Ge,Guixiang Ma,Sihong Xie,Philip S. Yu
DOI: https://doi.org/10.48550/arXiv.1811.03739
2018-11-09
Abstract:Reviews spams are prevalent in e-commerce to manipulate product ranking and customers decisions maliciously. While spams generated based on simple spamming strategy can be detected effectively, hardened spammers can evade regular detectors via more advanced spamming strategies. Previous work gave more attention to evasion against text and graph-based detectors, but evasions against behavior-based detectors are largely ignored, leading to vulnerabilities in spam detection systems. Since real evasion data are scarce, we first propose EMERAL (Evasion via Maximum Entropy and Rating sAmpLing) to generate evasive spams to certain existing detectors. EMERAL can simulate spammers with different goals and levels of knowledge about the detectors, targeting at different stages of the life cycle of target products. We show that in the evasion-defense dynamic, only a few evasion types are meaningful to the spammers, and any spammer will not be able to evade too many detection signals at the same time. We reveal that some evasions are quite insidious and can fail all detection signals. We then propose DETER (Defense via Evasion generaTion using EmeRal), based on model re-training on diverse evasive samples generated by EMERAL. Experiments confirm that DETER is more accurate in detecting both suspicious time window and individual spamming reviews. In terms of security, DETER is versatile enough to be vaccinated against diverse and unexpected evasions, is agnostic about evasion strategy and can be released without privacy concern.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the behavior of malicious reviewers in online product review systems who manipulate product rankings and consumer decisions by posting false reviews. Although text - based and graph - structure - based detection methods can effectively identify simple spam reviews, more advanced spam review strategies can evade these detection methods. However, adversarial attacks against behavior - based detection methods have not been fully studied, which leads to potential vulnerabilities in spam review detection systems. To this end, the paper makes two main contributions: 1. **EMERAL (Evasion via Maximum Entropy and Rating sAmpling)** - **Purpose**: Generate evasive spam reviews against existing detectors, simulating spam reviewers with different targets and knowledge levels to attack at different stages of the product life cycle. - **Method**: Generate spam reviews that can evade multiple detection signals through the maximum entropy model and rating sampling techniques. Specifically, EMERAL can generate a rating distribution that conforms to specific domain constraints, thereby increasing the average product rating without being detected. 2. **DETER (Defense via Evasion generaTion using EmeRal)** - **Purpose**: Based on the diverse evasive samples generated by EMERAL, retrain the detection model to make it more robust when facing unknown and diverse evasion strategies. - **Method**: Retrain the detection model by adding the evasive samples generated by EMERAL to the training data. Experimental results show that DETER is more accurate in detecting suspicious time windows and individual spam reviews and has strong adaptability to various evasion strategies. ### Main problems - **Limitations of existing detection methods**: Although existing text - based and graph - structure - based detection methods can effectively identify simple spam reviews, these methods are easily bypassed for more advanced spam review strategies. - **Vulnerabilities of behavior detection methods**: Insufficient research on adversarial attacks against behavior - based detection methods has led to potential security vulnerabilities in the system. - **Challenges in generating evasive samples**: How to generate spam reviews that can evade multiple detection signals while satisfying specific domain constraints. ### Solutions - **EMERAL**: Generate evasive samples through the maximum entropy model, simulating spam reviewers with different targets and knowledge levels. - **DETER**: Utilize the diverse evasive samples generated by EMERAL to retrain the detection model and improve its robustness when facing unknown and diverse evasion strategies. ### Experimental verification - **Dataset**: Use datasets from Amazon and Yelp for experiments. - **Evaluation metric**: Use AUC (Area Under the Curve) as the evaluation metric for detection effectiveness. - **Experimental results**: DETER performs well in detecting suspicious time windows and individual spam reviews, outperforming other methods with fixed single detection signals and simple signal aggregation methods. In conclusion, by proposing EMERAL and DETER, this paper aims to address the shortcomings of existing spam review detection methods when facing advanced evasion strategies and improve the robustness and security of the detection system.