A Kind of Automatic Text Keyphrase Extraction Method Based on Word Co-occurrence
Geng Huan-Tong,Cai Qing-Sheng,Yu Kun,Zhao Peng
DOI: https://doi.org/10.3321/j.issn:0469-5097.2006.02.006
2006-01-01
Abstract:Advances in high-volume storage media have led to an explosion in the amount of machine readable text.Keyphrase extraction is one of the fundamental works of natural language processing.In this paper,a novel automatic text keyphrase extraction method based on word co-occurrence is put forward on the basis of the research of existing keyphrase extraction method.The method,based on word frequency statistics utilizes text subject information based on word co-occurrence graph and linkage information of different text subjects.Our goal is to extract keyphrases with content most accurately matching specific and unique interest of the user.This algorithm for extracting keyphrases represents the asserted main point in a document,without relying on external devices such as natural language processing tools or a document corpus.Our algorithm is based on the segmentation of a graph,representing the co-occurrence between terms in a document,into clusters.Each cluster corresponds to a concept on which author's idea is based,and the top ranked terms on statistical basis.The relationship between each term to these clusters is selected as keyphrases.The experimental results show that thus extracted terms match author's point quite accurately,even though this method does not use the average frequency of each term in a corpus,i.e.,this method is a content sensitive,domain independent device of indexing.Its purpose finds the words of nonfrequency but great contribution to text subject. The greatest benefit is the extraction of nonfrequency words which carry the effect of the document,i.e.,concepts preseuted by the author.This merit can lead to the satisfaction of search engine users with unique interests or ideas.