Topic Detection Technology for Chinese Text Based on Statistics and Semantic Information

FENG Jin,LI Chunping
DOI: https://doi.org/10.3321/j.issn:1000-0054.2005.09.015
2005-01-01
Abstract:Requirements for extracting main information from Chinese texts sharply stand out because the complexity of Chinese word segments have partly restricted the development of Chinese information retrieval. A novel extraction method is proposed in this paper. The new method extracts the keywords and phrases expressing the main idea of text by using Chinese words segmentation, frequent searched words, and the parts of speech computation. Moreover, scoring and ordering of these extracted words are also given. The experiments on People's Dairy Corpus and some real texts such as webs, emails, etc. were made. The results show that the accuracy of this approach can exceed 66% on the People's Dairy Corpus. Meanwhile, it also has a good result on real texts.Key words: information retrieval; Chinese
What problem does this paper attempt to address?