Genre Identification Of Chinese Finance Text Using Machine Learning Method

Jun Xu,Yuxin Ding,Xiaolong Wang,Yonghui Wu
DOI: https://doi.org/10.1109/icsmc.2008.4811318
2008-01-01
Abstract:Document genre information is one of the most distinguishing features in information retrieval, which brings order to the search results. What the genre classification concerned is not the topic but the genre of document. In this paper, we examine the effectiveness of using machine learning techniques to solve genre classification of Chinese text with the same topic, viz. finance. Based on the likelihood ratio test, we present a new method for selecting feature terms, which can improve the performance clearly and perform better than others with up to 80% terms removal. In empirical results with SVMs classifier on the real world corpora, we find that this method can gain a better selecting effect and likelihood ratio is a reliable measure for selecting informative features.
What problem does this paper attempt to address?