Keyword extraction based on sequential pattern mining

Jiajia Feng,Fei Xie,Xuegang Hu,Peipei Li,Jie Cao,Xindong Wu
DOI: https://doi.org/10.1145/2043674.2043685
2011-01-01
Abstract:Keyword extraction is to automatically extract keywords that capture the main topic discussed in a given document. In this paper, a new keyword extraction algorithm based on sequential patterns is proposed. By preprocessing, a document is represented as sequences of words where a sequential pattern mining algorithm is applied on, and important sequential patterns are mined that reflect the semantic relatedness between words. Both statistical features and pattern features within words are used to build the keyword extraction model. The algorithm is independent of languages and does not need the help of a semantic dictionary to get the semantic features. Experimental results on Chinese journal articles show that the proposed algorithm always outperforms the baseline method KEA.
What problem does this paper attempt to address?