TextRank Keyword Extraction Method Weighted by Multivariate Quantitative Indexes

Xin Luan,Wenya Gao,Ming Chen,Dalei Song
DOI: https://doi.org/10.1117/12.2626538
2021-01-01
Abstract:In the process of keyword extraction, news text has its uniqueness. Keywords extraction of news text not only needs to pay attention to the difference of quantitative indexes of words, but also needs to consider the influence of phrases. In order to improve the keyword extraction effect of news texts, this paper constructs a keyword graph based on TextRank, improves the probability transition matrix by combining four quantitative indicators of node frequency, location, span and part of speech, realizing the weight difference of words. Considering the influence of word segmentation technology on phrases extraction, the reconstruction of phrases is completed according to the law of recombination and the concept of combinatorial entropy is defined to realize the filtering of reconstructed phrases. According to the statistical quantitative index of phrases, the linear weighted value is assigned to the reconstructed phrases, and finally, the TopN words or phrases are selected as keywords according to their weight value. Experimental results show that the proposed algorithm is not only superior to the traditional TextRank and TF-IDF algorithms, but also has great advantages compared with the improved PositionRank and MyWPMWRank algorithms, the F value of which can be increased by 9.75% at most, which effectively improves the keywords extraction effect of news text.
What problem does this paper attempt to address?