Named Entity Recognition in Chinese News Comments on the Web.

Xiaojun Wan,Liang Zong,Xiaojiang Huang,Tengfei Ma,Houping Jia,Yuqian Wu,Jianguo Xiao
2011-01-01
Abstract:News comment is a new text genre in the Web 2.0 era. Many people often write comments to express their opinions about recent news events or topics after they read news articles. Because news comments are freely written without checking, they are very different from formal news texts. In particular, named entities in news comments are usually composed of some wrongly written words, informal abbreviations or aliases, which brings great difficulties for machine detection and understanding. This paper addresses the task of named entity recognition in Chinese news comments on the Web. We propose to leverage the entity information in the referred news article to improve named entity recognition in the news comments. Three different schemes are investigated to find useful entities in the news article for new feature generation in the CRFs model. Finally, a dictionary-based correction step is employed to further improve the results. We manually labelled a benchmark dataset with 60 pieces of news and 6000 comments downloaded from a popular Chinese news portal – www.sina.com.cn. The experimental results on the dataset show that our method is effective for this special task.
What problem does this paper attempt to address?