Abstract:Keyphrases provide semantic metadata producing an overview of the content of a document, they are used in many text-mining applications. This paper proposes a new method that improves automatic keyphrase extraction by using semantic information of candidate keyphrases. Our method is realized in two stages. In selecting candidates stage, after extraction of all phrases from document, a word sense disambiguation method is used to get senses of phrases, then term conflation is performed by using case folding, stemming, and semantic relatedness between candidates. In filtering stage, four features are used to compute for each candidate: the TFxIDF measure describing the specificity of a phrase, first occurrence of a phrase in the document, length of a phrase, and coherence score which measure the semantic relatedness between the phrase and other candidates. A Naive Bayes scheme builds a prediction model training data with known keyphrases, and then uses the model to calculate the overall probability for each candidate. We evaluate semantically improved method against the well known Kea system by using a more effective semantically enhanced evaluation method. The inter-domain experiment shows that quality of keyphrases extraction can be improved significantly when semantic information is exploited. The intra-domain experiment shows our method is competitive with Kea++ algorithm, and not domain-specific.

Automatic keyphrase extraction from chinese news documents

A Combining Approach to Automatic Keyphrases Indexing for Chinese News Documents

Keyphrase Extraction from Chinese News Web Pages Based on Semantic Relations

An Automatic Online News Topic Keyphrase Extraction System

Extracting Keyphrases from Chinese News Articles Using TextRank and Query Log Knowledge ?

Automatic Keyphrase Extraction from Chinese Books

Improving Keyphrase Extraction from Web News by Exploiting Comments Information

Improved Automatic Keyphrase Extraction by Using Semantic Information

Research of Chinese Key-Phrase Extraction Based on Lexical Rule and Apriori Algorithm

Automatic Keyphrase Extraction by Bridging Vocabulary Gap.

Keyphrases automatic extraction from the abstracts of English scientific papers based on Scopus retrieval

News-oriented Automatic Chinese Keyword Indexing

Keyword Extraction Based on Tf/idf for Chinese News Document

An Automatic Keyphrase Extraction System for Scientific Documents

Learning to extract coherent keyphrases from online news

Keyword Extraction Based on Lexical Chains and Word Co-occurrence for Chinese News Web Pages

Automatic Keyphrase Extraction with A Refined Candidate Set

Mining Construction Rules of Chinese Keyphrase Based on Rough Set Theory

Chinese Keyword Extraction Algorithm Based on Neighbour Words

KeyphraseDS: Automatic Generation of Survey by Exploiting Keyphrase Information

SJTULTLAB: Chunk based method for keyphrase extraction