Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms.

Yang Liu,Jian-Wu Bi,Zhi-Ping Fan
DOI: https://doi.org/10.1016/j.eswa.2017.03.042
2017-01-01
Abstract:A framework for multi-class sentiment classification is proposed.A total of 3600 comparative experiments are conducted.Performances of different feature selection/machine learning algorithms are compared.The results are valuable for further studies on multi-class sentiment classification. Multi-class sentiment classification has extensive application backgrounds, whereas studies on this issue are still relatively scarce. In this paper, a framework for multi-class sentiment classification is proposed, which includes two parts: 1) selecting important features of texts using the feature selection algorithm, and 2) training multi-class sentiment classifier using the machine learning algorithm. Then, experiments are conducted for comparing the performances of four popular feature selection algorithms (document frequency, CHI statistics, information gain and gain ratio) and five popular machine learning algorithms (decision tree, nave Bayes, support vector machine, radial basis function neural network and K-nearest neighbor) in multi-class sentiment classification. The experiments are conducted on three public datasets which include twelve data subsets, and 10-fold cross validation is used to obtain the classification accuracy concerning each combination of feature selection algorithm, machine learning algorithm, feature set size and data subset. Based on the obtained 3600 classification accuracies (4 feature selection algorithms 5 machine learning algorithms 15 feature set sizes 12 data subsets), the average classification accuracy of each algorithm is calculated, and the Wilcoxon test is used to verify the existence of significant difference between different algorithms in multi-class sentiment classification. The results show that, in terms of classification accuracy, gain ratio performs best among the four feature selection algorithms and support vector machine performs best among the five machine learning algorithms. In terms of execution time, the similar comparisons are also conducted. The obtained results would be valuable for further improving the existing multi-class sentiment classifiers and developing new multi-class sentiment classifiers.
What problem does this paper attempt to address?