BibRank: Automatic Keyphrase Extraction Platform Using~Metadata

Abdelrhman Eldallal,Eduard Barbu
DOI: https://doi.org/10.3390/info14100549
2023-10-13
Abstract:Automatic Keyphrase Extraction involves identifying essential phrases in a document. These keyphrases are crucial in various tasks such as document classification, clustering, recommendation, indexing, searching, summarization, and text simplification. This paper introduces a platform that integrates keyphrase datasets and facilitates the evaluation of keyphrase extraction algorithms. The platform includes BibRank, an automatic keyphrase extraction algorithm that leverages a rich dataset obtained by parsing bibliographic data in BibTeX format. BibRank combines innovative weighting techniques with positional, statistical, and word co-occurrence information to extract keyphrases from documents. The platform proves valuable for researchers and developers seeking to enhance their keyphrase extraction algorithms and advance the field of natural language processing.
Computation and Language
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the problem of Automatic Keyphrase Extraction (AKE), particularly in the context of scientific literature. Specifically, the paper proposes a platform called BibRank, which integrates a large number of keyphrase datasets and provides a method for evaluating the performance of keyphrase extraction algorithms. The main contributions of the paper include: 1. **BibRank Dataset**: Constructing an information-rich dataset by parsing publicly available bibliographic data in BibTeX format, which includes manually assigned keyphrases. 2. **BibRank Algorithm**: Introducing a new keyphrase extraction method—the BibRank algorithm, which leverages bibliographic information and statistical data from the BibRank dataset. 3. **BibRank Platform**: Providing a downloadable platform that integrates the BibRank dataset, the BibRank algorithm, and other state-of-the-art keyphrase extraction algorithms. The platform also includes evaluation metrics and allows for the integration of additional keyphrase extraction algorithms and datasets. 4. **Manual Evaluation of Keyphrases**: Using a gold standard dataset as a benchmark to evaluate keyphrase extraction algorithms and relying on expert human evaluators to judge the quality and effectiveness of these algorithms. To validate the effectiveness of the proposed algorithm and platform, the paper also conducts detailed experimental evaluations, including the calculation of standard performance metrics for the algorithms (such as recall, precision, and F1 score) and the evaluation of the gold standard dataset itself. These evaluations not only test the accuracy of the algorithms but also examine the quality of the existing gold standard datasets.