3WS-ITSC: Three-Way Sampling on Imbalanced Text Data for Sentiment Classification

Yu Fang,Zhao-Chen Li,Xin Yang,Fan Min
DOI: https://doi.org/10.1007/978-3-031-21244-4_30
2022-01-01
Abstract:Sentiment analysis is an important research direction of natural language processing. The data imbalance is a critical issue in text sentiment classification task. That arises the problem of high misclassification cost. This paper proposes a three-way sampling sentiment classification model for imbalanced text data to reduce the misclassification cost. Specifically, the model extracts boundary points through three-way sampling and collaborates with cost-sensitive learning for action on sampled results. Firstly, in order to reduce sampling time, the text data is converted into a one-dimensional vector by bag mapping. Secondly, three-way sampling is used to obtain boundary points that can characterize the majority class. Finally, a sequential three-way sentiment classification algorithm is used to predict sentiment polarity. The experimental results show that the proposed model outperforms state-of-the-art sentiment classification methods in the scenario of extremely imbalanced test data.
What problem does this paper attempt to address?