Abstract:In recent years, the prevalence of fake reviews on online platforms has become a significant concern, as these deceptive reviews can mislead consumers and impact purchasing decisions. This research paper explores various methods for detecting fake reviews using machine learning techniques. We utilized the deceptive opinion spam dataset, which includes both truthful and deceptive hotel reviews for 20 Chicago hotels. The dataset comprises 1600 reviews, evenly split between truthful positive, deceptive positive, truthful negative, and deceptive negative reviews. Our primary objective was to classify these reviews as either truthful or deceptive using several machine learning algorithms. We constructed a data frame with columns for the review text, polarity class, and spamity class. Polarity indicates whether a review is positive or negative, while spamity distinguishes between truthful and deceptive reviews. Stopwords were removed from the reviews using the nltk package from sklearn, and text mining techniques were applied to convert text strings into numerical data. We also extracted parts of speech from the reviews to use as features in our models. We experimented with four classification techniques: Naïve-Bayes, Support Vector Machine (SVM), Decision Tree, and Random Forest classifiers. The Naïve-Bayes classifier, specifically the Multinomial NB algorithm, achieved an accuracy of 89.13%. The SVM yielded an accuracy of 82.155%, while the Decision Tree algorithm resulted in an accuracy of 65.55%. The Random Forest classifier demonstrated the highest accuracy, reaching 91.72%. Confusion matrices, generated using the sklearn metric module, validated the accuracy of each algorithm. Given the superior performance of the Random Forest classifier and Naïve-Bayes, these models were selected for further analysis. Our findings indicate that these machine learning techniques can effectively identify fake reviews, thereby helping to mitigate their misleading impact on consumers. For future work, we aim to expand our study to include datasets from other platforms such as Amazon and flipakrt, and to explore different feature selection methods. We also plan to apply sentiment classification algorithms using various tools like Python, R, Statistical Analysis System (SAS), and Stata, to detect fake reviews and evaluate the performance of these tools. This research was supported by the Technical University of Kerala. We extend our gratitude to our colleagues for their expertise, which significantly aided the research, although they may not concur with all the interpretations presented in this paper. Through this study, we contribute to the ongoing efforts to enhance the reliability of online reviews and protect consumers from deceptive practices.

Machine Cleaning of Online Opinion Spam: Developing a Machine-Learning Algorithm for Detecting Deceptive Comments

Towards a General Rule for Identifying Deceptive Opinion Spam

Opinion Spam Detection: A New Approach Using Machine Learning and Network-Based Algorithms

Finding Deceptive Opinion Spam by Any Stretch of the Imagination

Analyzing and Detecting Adversarial Spam on a Large-scale Online APP Review System.

Fake Reviews Detection using Supervised Machine Learning Algorithms

Voting for Deceptive Opinion Spam Detection

A Systematic Review of Machine Learning Approaches for Detecting Deceptive Activities on Social Media: Methods, Challenges, and Biases

Are Your Comments Positive? A Self-Distillation Contrastive Learning Method for Analyzing Online Public Opinion

Exploring characteristics of online news comments and commenters with machine learning approaches

Deep Learning-Based Truthful and Deceptive Hotel Reviews

Learning to identify review spam

Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques

Abusive Language Detection in Online User Content

Comparison of Machine Learning and Sentiment Analysis in Detection of Suspicious Online Reviewers on Different Type of Data

User Perceptions of AI-Based Comment Filtering Technology

Fake social media news and distorted campaign detection framework using sentiment analysis & machine learning

Purging the Poison: A Machine Learning Approach to Filtering Toxic Comments

Toward a Language Modeling Approach for Consumer Review Spam Detection

Text Mining and Probabilistic Language Modeling for Online Review Spam Detection.

Fake Comment Detection Based on Sentiment Analysis