Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification

Tomas Pranckevičius,Virginijus Marcinkevičius
DOI: https://doi.org/10.22364/BJMC.2017.5.2.05
Baltic Journal of Modern Computing
Abstract:. Today, a largely scalable computing environment provides a possibility of carrying out various data-intensive natural language processing and machine-learning tasks. One of these is text classification with some issues recently investigated by many data scientists. The authors of this paper investigate Naïve Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression classifiers implemented in Apache Spark, i.e. the in-memory intensive computing platform. The focus of the paper is on comparing these classifiers by evaluating the classification accuracy, based on the size of training data sets, and the number of n -grams. In experiments, short texts for product-review data from Amazon 1 were analyzed.
Computer Science
What problem does this paper attempt to address?