Keyphrases automatic extraction from the abstracts of English scientific papers based on Scopus retrieval

Lielei Chen,Hui Fang
DOI: https://doi.org/10.13232/j.cnki.jnju.2018.03.016
2018-01-01
Abstract:Keyphrases automatic extraction technology has been gradually used in scientific publications.Objective and accurate keyphrases are utilized to clustering documents in databases.In addition,suitable keyphrases assist researchers in finding relevant papers.This paper proposes a method based on TFIDF(Term Frequency-Inverse Document Frequency)and Scopus database retrieval to extract keyphrases automatically from abstracts of English scientific papers.Our method considers all the documents indexed in the Scopus database as corpus,and uses Scopus API to retrieve candidates in the database automatically.Compared with the traditional approaches that rely on manually established and annotated corpus,our method is more convenient with richer available data.Taking the advantages that abstracts have less redundant information,the key phrases were extracted from the abstracts based on the statistical features of the full text.We constructed the structural characteristics of abstracts and introduced the position feature of candidates.Moreover,two type stop-words lists for excluding noise candidates were developed for a better performance.The experimental results show that our method performed well.
What problem does this paper attempt to address?