Abstract:In recent years, the prevalence of fake reviews on online platforms has become a significant concern, as these deceptive reviews can mislead consumers and impact purchasing decisions. This research paper explores various methods for detecting fake reviews using machine learning techniques. We utilized the deceptive opinion spam dataset, which includes both truthful and deceptive hotel reviews for 20 Chicago hotels. The dataset comprises 1600 reviews, evenly split between truthful positive, deceptive positive, truthful negative, and deceptive negative reviews. Our primary objective was to classify these reviews as either truthful or deceptive using several machine learning algorithms. We constructed a data frame with columns for the review text, polarity class, and spamity class. Polarity indicates whether a review is positive or negative, while spamity distinguishes between truthful and deceptive reviews. Stopwords were removed from the reviews using the nltk package from sklearn, and text mining techniques were applied to convert text strings into numerical data. We also extracted parts of speech from the reviews to use as features in our models. We experimented with four classification techniques: Naïve-Bayes, Support Vector Machine (SVM), Decision Tree, and Random Forest classifiers. The Naïve-Bayes classifier, specifically the Multinomial NB algorithm, achieved an accuracy of 89.13%. The SVM yielded an accuracy of 82.155%, while the Decision Tree algorithm resulted in an accuracy of 65.55%. The Random Forest classifier demonstrated the highest accuracy, reaching 91.72%. Confusion matrices, generated using the sklearn metric module, validated the accuracy of each algorithm. Given the superior performance of the Random Forest classifier and Naïve-Bayes, these models were selected for further analysis. Our findings indicate that these machine learning techniques can effectively identify fake reviews, thereby helping to mitigate their misleading impact on consumers. For future work, we aim to expand our study to include datasets from other platforms such as Amazon and flipakrt, and to explore different feature selection methods. We also plan to apply sentiment classification algorithms using various tools like Python, R, Statistical Analysis System (SAS), and Stata, to detect fake reviews and evaluate the performance of these tools. This research was supported by the Technical University of Kerala. We extend our gratitude to our colleagues for their expertise, which significantly aided the research, although they may not concur with all the interpretations presented in this paper. Through this study, we contribute to the ongoing efforts to enhance the reliability of online reviews and protect consumers from deceptive practices.

Towards a General Rule for Identifying Deceptive Opinion Spam

Identifying Manipulated Offerings on Review Portals.

Finding Deceptive Opinion Spam by Any Stretch of the Imagination

TopicSpam: a Topic-Model Based Approach for Spam Detection.

Estimating the Prevalence of Deception in Online Review Communities

Voting for Deceptive Opinion Spam Detection

Analyzing and Detecting Adversarial Spam on a Large-scale Online APP Review System.

Camouflage is NOT Easy: Uncovering Adversarial Fraudsters in Large Online App Review Platform

Opinion Spam Detection: A New Approach Using Machine Learning and Network-Based Algorithms

Learning to identify review spam

Machine Cleaning of Online Opinion Spam: Developing a Machine-Learning Algorithm for Detecting Deceptive Comments

Deep Learning-Based Truthful and Deceptive Hotel Reviews

Fake Reviews Detection using Supervised Machine Learning Algorithms

Review Graph Based Online Store Review Spammer Detection

Toward a Language Modeling Approach for Consumer Review Spam Detection

Fake review detection on yelp dataset

Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques

Identify Online Store Review Spammers Via Social Review Graph

Interpretable and Effective Opinion Spam Detection Via Temporal Patterns Mining Across Websites

Text Mining and Probabilistic Language Modeling for Online Review Spam Detection.

Fast Detection of Deceptive Reviews by Combining the Time Series and Machine Learning