Optimize Naïve Bayes Classifier Using Chi Square and Term Frequency Inverse Document Frequency For Amazon Review Sentiment Analysis

Anisa Falasari,Much Aziz Muslim
DOI: https://doi.org/10.52465/joscex.v3i1.68
2022-03-30
Journal of Soft Computing Exploration
Abstract:The rapid development of the internet has made information flow rapidly wich has an impact on the world of commerce. Some people who have bought a product will write their opinion on social media or other online site. Long-text buyer reviews need a machine to recognize opinions. Sentiment analysis applies the text mining method. One of the methods applied in sentiment analysis is classification. One of the classification algorithms is the naïve bayes classifier. Naïve bayes classifier is a classification method with good efficiency and performance. However, it is very sensitive with too many features, wich makes the accuracy low. To improve the accuracy of the naïve bayes classifier algorithm it can be done by selecting features. One of the feature selection is chi square. The selection of features with chi square calculation based on the top-K value that has been determined, namely 450. In addition, weighting features can also improve the accuracy of the naïve bayes classifier algorithm. One of the feature weighting techniques is term frequency inverse document frequency (TF-IDF). In this study, using sentiment labelled dataset (field amazon_labelled) obtained from UCI Machine Learning. This dataset has 500 positive reviews and 500 negative reviews. The accuracy of the naïve bayes classifier in the amazon review sentiment analysis was 82%. Meanwhile, the accuracy of the naïve bayes classifier by applying chi square and TF-IDF is 83%.
What problem does this paper attempt to address?