Abstract:Keyphrases provide semantic metadata producing an overview of the content of a document, they are used in many text-mining applications. This paper proposes a new method that improves automatic keyphrase extraction by using semantic information of candidate keyphrases. Our method is realized in two stages. In selecting candidates stage, after extraction of all phrases from document, a word sense disambiguation method is used to get senses of phrases, then term conflation is performed by using case folding, stemming, and semantic relatedness between candidates. In filtering stage, four features are used to compute for each candidate: the TFxIDF measure describing the specificity of a phrase, first occurrence of a phrase in the document, length of a phrase, and coherence score which measure the semantic relatedness between the phrase and other candidates. A Naive Bayes scheme builds a prediction model training data with known keyphrases, and then uses the model to calculate the overall probability for each candidate. We evaluate semantically improved method against the well known Kea system by using a more effective semantically enhanced evaluation method. The inter-domain experiment shows that quality of keyphrases extraction can be improved significantly when semantic information is exploited. The intra-domain experiment shows our method is competitive with Kea++ algorithm, and not domain-specific.

ON AN IMPROVED NAVE BAYESIAN KEYWORD EXTRACTION ALGORITHM

Exploring simultaneous keyword and key sentence extraction: improve graph-based ranking using wikipedia.

Exploring Simultaneous Keyword and Key Sentence Extraction

Chinese Keyword Extraction Algorithm Based on Neighbour Words

A Way to Improve Graph-Based Keyword Extraction

Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words

Improved Automatic Keyphrase Extraction by Using Semantic Information

Automatic Keyword Extraction Algorithm Research Using BC Method

An Improved Method of Keywords Extraction Based on Short Technology Text.

Algorithm of Chinese Keywords Extraction based on Multi-feature

Research on a Compound Keywords Detection Method Based on Small World Model

Design and Analysis of Genetic Algorithm Based Chinese Keyword Extracting.

SIFRANK Algorithm for Chinese Text Keyword Extraction Based on Dependent Semantic Feature Constraints

Domain Term Extraction Method Based on Hierarchical Combination Strategy for Chinese Web Documents

Machine Learning for Keyphrases Extraction Based on Naive Bayesian Classifier

Chinese Keyword Extraction Based on N-Gram and Word Co-occurrence

Improved Automatic Keyword Extraction Given More Semantic Knowledge

Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction.

Research on the Chinese Keyword Extraction Algorithm Based on Separate Models

Keyword Extraction Based on New Word Detection

Semantically Improved Automatic Keyphrase Extraction