Automatic Keyword Extraction Using Word Embedding and Clustering

Ping Zeng,Qingping Tan,Ying Yan,Qinzheng Xie,Jianjun Xu,Wei Cao
DOI: https://doi.org/10.1109/iccsec.2017.8447033
2017-01-01
Abstract:Existing word-frequency-based algorithms for keyword extraction do not consider the semantic relationships among words. Moreover, word-graph-based algorithms cannot distinguish multiple topics, and topic-model-based algorithms possess high time complexity. All of these keyword extraction algorithms exhibit limitations. This paper proposes a new word-embedding-based algorithm, namely, WEC, for keyword extraction. The algorithm incorporates word frequency, effects of word co-occurrence, and semantic relationship among contexts. The algorithm also estimates the final weights of words with cosine similarity and pointwise mutual information and extracts topics by clustering. Experimental results show that the WEC algorithm outperforms state-of-the-art keyword extraction methods on four datasets when tested under various evaluation metrics.
What problem does this paper attempt to address?