Feature Selection for Imbalanced Sentiment Classification

WANG Zhihao,WANG Zhongqing,LI Shoushan,LI Peifeng
DOI: https://doi.org/10.3969/j.issn.1003-0077.2013.04.017
2013-01-01
Abstract:With the rapid development of Internet,the task of sentiment classification has attracted a great attention by many researchers in the area of natural language processing.In this paper,we focus on the sentiment classification tasks where the data distribution is imbalanced(named imbalanced sentiment classification).To reduce the high-dimensional feature space in imbalanced sentiment classification,we investigate four classic feature selection(FS) methods that are popularly studied in traditional text categorization.Furthermore,three different feature selection modes are proposed and compared in the specific task.The experimental results demonstrate that using the feature selection methods is capable of significantly reducing the dimension of the feature vector without any loss in the classification performance.Besides,the results show that the FS method of information gain(IG) combined with the mode Feature selction after random under-sampling performs best.
What problem does this paper attempt to address?