Sentiment Analysis on Chinese Health Forums: A Preliminary Study of Different Language Models.

Yan Zhang,Yong Zhang,Jennifer Xu,Chunxiao Xing,Hsinchun Chen
DOI: https://doi.org/10.1007/978-3-319-29175-8_7
2016-01-01
Abstract:Sentiment analysis on Chinese health forums is challenging because of the language, platform, and domain characteristics. Our research investigates the impact of three factors on sentiment analysis: sentiment polarity distribution, language models, and model settings. We manually labeled a large sample of Chinese health forum posts, which showed an extremely unbalanced distribution with a very small percentage of negative posts, and found that the balanced training set could produce higher accuracy than the unbalanced one. We also found that the hybrid approaches combining multiple language model based approaches for sentiment analysis performed better than individual approaches. Finally we evaluated the effects of different model settings and improved the overall accuracy using the hybrid approaches in their optimal settings. Findings from this preliminary study provide deeper insights into the problem of sentiment analysis on Chinese health forums and will inform future sentiment analysis studies.
What problem does this paper attempt to address?