Abstract:Keyword extraction plays the same role as the cornerstone in the field of natural language processing. Text classification, information retrieval, abstract generation and text clustering are all based on keyword extraction. This article takes the research of keyword extraction model as the subject. First, it analyzes the principle and limitations of the traditional keyword extraction model TF-IDF when extracting keywords. Secondly, it focuses on the problem of ignoring context and word polysemy in the keyword extraction model. To improve, introduce the concept of context vector, construct a chain-extensible structure for polysemous words, and propose a new keyword extraction method of TF-IDF that combines context and semantic classification. The specific research contents are as follows: TF-IDF extracts keywords based on multiple texts. The target keywords are words that appear frequently in the current text, but appear in other articles that are significantly lower in frequency than the current article. This method takes into account the characteristic content of the article and makes the article distinct. However, this method ignores the influence of the context of the article and the problem of word polysemy. Obviously, the expression of the thought of the article will not only be affected by the context of the words, but also by the semantics of the words. In order to solve the above problems, this paper proposes a TF-IDF keyword extraction method that combines context and semantic classification. First, the text constructs a linked list expandable structure to solve the problem of polysemy; then, the influence of context is introduced. Before inserting a new word into the extended structure, the semantic similarity is calculated through the context vector. Finally, iterate over the entire expandable structure, sum the word frequency with semantic similarity under each semantic, and then sort the results to obtain TopN words as the text keywords of the corresponding article. In this paper, the Sogou news data set is used as the input data of the model, and the accuracy of the experimental results is significantly better than that of TF-IDF.

A new network model for extracting text keywords

Exploring Simultaneous Keyword and Key Sentence Extraction

Exploring simultaneous keyword and key sentence extraction: improve graph-based ranking using wikipedia.

A Way to Improve Graph-Based Keyword Extraction

Automatic Keyword Extraction Based on Phrase Network

WS-rank: Bringing sentences into graph for keyword extraction

Bert-Based Text Keyword Extraction

Research on Weighted Complex Network Based Keywords Extraction.

TF-IDF Keyword Extraction Method Combining Context and Semantic Classification

News keyword extraction algorithm based on semantic clustering and word graph model

Complex Network based Supervised Keyword Extractor

Keywords Extraction via Multi-relational Network Construction

Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods

Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction.

General-use unsupervised keyword extraction model for keyword analysis

The enhancement of TextRank algorithm by using word2vec and its application on topic extraction

A Modified Approach To Keyword Extraction Based On Word-Similarity

Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words

Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set

WordTopic-MultiRank: A New Method for Automatic Keyphrase Extraction.

Using citation networks to evaluate the impact of text length on keyword extraction