Feature subsumption for sentiment classification in multiple languages

Zhongwu Zhai,Hua Xu,Jun Li,Peifa Jia
DOI: https://doi.org/10.1007/978-3-642-13672-6_26
2010-01-01
Abstract:An open problem in machine learning-based sentiment classification is how to extract complex features that outperform simple features; figuring out which types of features are most valuable is another Most of the studies focus primarily on character or word Ngrams features, but substring-group features have never been considered in sentiment classification area before In this study, the substring-group features are extracted and selected for sentiment classification by means of transductive learning-based algorithm To demonstrate generality, experiments have been conducted on three open datasets in three different languages: Chinese, English and Spanish The experimental results show that the proposed algorithm's performance is usually superior to the best performance in related work, and the proposed feature subsumption algorithm for sentiment classification is multilingual Compared to the inductive learning-based algorithm, the experimental results also illustrate that the transductive learning-based algorithm can significantly improve the performance of sentiment classification As for term weighting, the experiments show that the “tfidf-c” outperforms all other term weighting approaches in the proposed algorithm.
What problem does this paper attempt to address?