Keyword Extraction using the Word Co-occurrence Network Properties that is Independent of Languages and Document Types and Its Evaluation by Prediction of Headline Words

Yuki YAMAMOTO,Ryohei ORIHARA
DOI: https://doi.org/10.1527/tjsai.24.303
2009-01-01
Transactions of the Japanese Society for Artificial Intelligence
Abstract:A word co-occurrence graph based on co-occurrence of words within sentences is known to have characteristics of a small-world and scale-free network. We built a keyword extraction algorithm using it betweenness-pass parameter in addition to comprehensive network parameters that include clustering coefficient, average path length and the number of links. Making use of the relationship between an article and its headline in a newspaper, we applied SVM algorithm to learn properties of the network parameters that characterize keywords, and tuned a keyword evaluation function composed of these parameters. We show our algorithm outperforms a past study with a similar technique. Moreover, the learned model is successfully applicable to documents written in an other language andor documents of other types.
What problem does this paper attempt to address?