Automatic keyphrase extraction from chinese news documents

Houfeng Wang,Sujian Li,Shiwen Yu
DOI: https://doi.org/10.1007/11540007_80
2006-01-01
Abstract:This paper presents a framework for automatically supplying keyphrases for a Chinese news document. It works as follows: extracts Chinese character strings from a source article as an initial set of keyphrase candidates based on frequency and length of the strings, then, filters out unimportant candidates from the initial set by using elimination-rules and transforms vague ones into their canonical forms according to controlled synonymous terms list and abbreviation list, and finally, selects the best items from the set of the remaining candidates by score measure. The approach is tested on People Daily corpus and the experiment results are satisfactory.
What problem does this paper attempt to address?