Abstract:In recent years, the prevalence of fake reviews on online platforms has become a significant concern, as these deceptive reviews can mislead consumers and impact purchasing decisions. This research paper explores various methods for detecting fake reviews using machine learning techniques. We utilized the deceptive opinion spam dataset, which includes both truthful and deceptive hotel reviews for 20 Chicago hotels. The dataset comprises 1600 reviews, evenly split between truthful positive, deceptive positive, truthful negative, and deceptive negative reviews. Our primary objective was to classify these reviews as either truthful or deceptive using several machine learning algorithms. We constructed a data frame with columns for the review text, polarity class, and spamity class. Polarity indicates whether a review is positive or negative, while spamity distinguishes between truthful and deceptive reviews. Stopwords were removed from the reviews using the nltk package from sklearn, and text mining techniques were applied to convert text strings into numerical data. We also extracted parts of speech from the reviews to use as features in our models. We experimented with four classification techniques: Naïve-Bayes, Support Vector Machine (SVM), Decision Tree, and Random Forest classifiers. The Naïve-Bayes classifier, specifically the Multinomial NB algorithm, achieved an accuracy of 89.13%. The SVM yielded an accuracy of 82.155%, while the Decision Tree algorithm resulted in an accuracy of 65.55%. The Random Forest classifier demonstrated the highest accuracy, reaching 91.72%. Confusion matrices, generated using the sklearn metric module, validated the accuracy of each algorithm. Given the superior performance of the Random Forest classifier and Naïve-Bayes, these models were selected for further analysis. Our findings indicate that these machine learning techniques can effectively identify fake reviews, thereby helping to mitigate their misleading impact on consumers. For future work, we aim to expand our study to include datasets from other platforms such as Amazon and flipakrt, and to explore different feature selection methods. We also plan to apply sentiment classification algorithms using various tools like Python, R, Statistical Analysis System (SAS), and Stata, to detect fake reviews and evaluate the performance of these tools. This research was supported by the Technical University of Kerala. We extend our gratitude to our colleagues for their expertise, which significantly aided the research, although they may not concur with all the interpretations presented in this paper. Through this study, we contribute to the ongoing efforts to enhance the reliability of online reviews and protect consumers from deceptive practices.

Improving Opinion Spam Detection by Cumulative Relative Frequency Distribution

Towards a General Rule for Identifying Deceptive Opinion Spam

TopicSpam: a Topic-Model Based Approach for Spam Detection.

Opinion Spam Detection: A New Approach Using Machine Learning and Network-Based Algorithms

Analyzing and Detecting Adversarial Spam on a Large-scale Online APP Review System.

Camouflage is NOT Easy: Uncovering Adversarial Fraudsters in Large Online App Review Platform

Identifying Manipulated Offerings on Review Portals.

Online detection and infographic explanation of spam reviews with data drift adaptation

Temporal Opinion Spam Detection by Multivariate Indicative Signals

Finding Deceptive Opinion Spam by Any Stretch of the Imagination

Spammer detection via ranking aggregation of group behavior

Interpretable and Effective Opinion Spam Detection Via Temporal Patterns Mining Across Websites

Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop

Learning to identify review spam

Detecting opinion spams and fake news using text classification

Fake Review Detection Using Behavioral and Contextual Features

Review Spam Detection Via Temporal Pattern Discovery

Toward a Language Modeling Approach for Consumer Review Spam Detection

Fake Reviews Detection using Supervised Machine Learning Algorithms

IFSpard: an Information Fusion-based Framework for Spam Review Detection

Review spam detection via time series pattern discovery.