Keyword Extraction Based on Lexical Chains and Word Co-occurrence for Chinese News Web Pages

Xinghua Li,Xindong Wu,Xuegang Hu,Fei Xie,Zhaozhong Jiang
DOI: https://doi.org/10.1109/icdmw.2008.122
2008-01-01
Abstract:This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution is an important statistical model widely used in natural language processing that reflects the correlation of the words. Lexical chains and word co-occurrence are combined in this paper to extract keywords for Chinese news Web pages in our proposed algorithm KELCC. This algorithm is not domain-specific and can be applied to a single Web page without corpus. Experiments on randomly selected Web pages have been performed to demonstrate the quality of the keywords extracted by our proposed algorithm.
What problem does this paper attempt to address?