Chinese Keyword Extraction Based on N-Gram and Word Co-occurrence

Hui Jiao,Qian Liu,Hui-bo Jia
DOI: https://doi.org/10.1109/CIS.Workshops.2007.42
2007-01-01
Abstract:This paper presents a new kind of Chinese text encoding method based on Chinese word, and establishes a new Chinese document format which deals with the automatic segmentation issue. This method makes word the smallest information unit. Chinese text analysis does not rely on segmentation by the method. On this word platform, N-gram and word co-occurrence statistical analysis are combined to carry out Chinese keyword extraction experiment. Firstly, candidate keywords are extracted with bi-gram model. Then, a set of co-occurrences between every word in bi-grams and frequent words is generated. Co-occurrence distribution shows importance of every word. According to the analysis result, keywords are chosen from bi-grams. This algorithm applies to a single document without using a corpus, and experimental results are satisfying.
What problem does this paper attempt to address?