Scientific Keyphrase Extraction: Extracting Candidates with Semi-supervised Data Augmentation.

Qianying Liu,Daisuke Kawahara,Sujian Li
DOI: https://doi.org/10.1007/978-3-030-01716-3_16
2018-01-01
Abstract:Keyphrase extraction can provide effective ways of organizing scientific documents. For this task, neural-based methods usually suffer from performance unstability due to data scarcity. In this paper, we adopt the pipeline two-step method including candidate extraction and keyphrase ranking, where candidate extraction is a key to influence the whole performance. In the candidate extraction step, to overcome the low-recall problem of traditional rule-based method, we propose a novel semi-supervised data augmentation method, where a neural-based tagging model and a discriminative classifier boost each other and get more confident phrases as candidates. With more reasonable candidates, keyphrase are identified with recall promoted. Experiments on SemEval 2017 Task 10 show that our model can achieve competitive results.
What problem does this paper attempt to address?