Enhancing unsupervised keyphrase extraction through the integration of structural details in embedding-based approaches

Sharma, Saurabh
DOI: https://doi.org/10.1007/s11042-024-19648-0
IF: 2.577
2024-06-27
Multimedia Tools and Applications
Abstract:Computational Linguistics or Natural Language Processing (NLP) emerged to enable systems to automatically identify and extract keyphrases from human language texts, to mitigate the exploitation of digital sources. To meet the increasing demand for keyphrase extraction tools, researchers are actively developing new tools that claim to be capable of processing any type of document in any field. In this article, an unsupervised word embedding-based approach for keyphrase extraction is proposed. The proposed method involved enhancing the state-of-the art word embeddings by the use of n-grams. Additionally, the method introduced a unique way to create word vectors by considering significant word vectors and their idf-scores. Our model is able to achieve an F-Score of 0.495 using the combination of Glove, Uni, and Bigrams. The combination of Skip-Gram with Uni and Bigrams also obtained better results for the DUC, SemEval, KDD and Inspec datasets in comparison to state-of-the-art.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?