Performance Evaluation of Machine Learning Classifiers in Sentiment Mining

Vinodhini G Chandrasekaran RM
DOI: https://doi.org/10.48550/arXiv.1402.3891
2014-02-17
Abstract:In recent years, the use of machine learning classifiers is of great value in solving a variety of problems in text classification. Sentiment mining is a kind of text classification in which, messages are classified according to sentiment orientation such as positive or negative. This paper extends the idea of evaluating the performance of various classifiers to show their effectiveness in sentiment mining of online product reviews. The product reviews are collected from Amazon reviews. To evaluate the performance of classifiers various evaluation methods like random sampling, linear sampling and bootstrap sampling are used. Our results shows that support vector machine with bootstrap sampling method outperforms others classifiers and sampling methods in terms of misclassification rate.
Machine Learning,Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the performance of different machine - learning classifiers in sentiment mining tasks, especially for sentiment classification of online product reviews. Specifically, the author hopes to experimentally compare the performance of four common machine - learning classifiers (decision tree, K - nearest neighbor, Naive Bayes, and support vector machine) under different sampling methods (random sampling, linear sampling, and bootstrap sampling) to determine which combination of classifier and sampling method has the lowest misclassification rate in the sentiment classification task. ### Main problems and objectives of the paper: 1. **Sentiment mining task**: Classify online product reviews according to sentiment tendencies (such as positive or negative). 2. **Classifier selection**: Select four common machine - learning classifiers (decision tree, K - nearest neighbor, Naive Bayes, and support vector machine) for performance evaluation. 3. **Influence of sampling methods**: Evaluate the influence of different sampling methods (random sampling, linear sampling, and bootstrap sampling) on the performance of classifiers. 4. **Data sources**: Use review data of five different products (camera, mobile phone, iPod, laptop, and music player) collected from Amazon. 5. **Performance indicators**: Measure the performance of classifiers by misclassification rate and ensure the reliability of results through cross - validation. ### Core contributions of the paper: - It is proved that the support vector machine (SVM) combined with bootstrap sampling shows the lowest misclassification rate in all tested product categories. - Analyze the influence of different sampling methods on the performance of classifiers and find that bootstrap sampling is significantly better than other sampling methods. - Provide future research directions in the field of sentiment mining and suggest further exploration of the application of ensemble learning and genetic algorithms. ### Summary: The main purpose of this paper is to evaluate the performance of different machine - learning classifiers and sampling methods in sentiment mining tasks through empirical research, and provide theoretical basis and technical guidance for practical applications.