BibRank: Automatic Keyphrase Extraction Platform Using~Metadata

Abdelrhman Eldallal,Eduard Barbu

DOI: https://doi.org/10.3390/info14100549

2023-10-13

Abstract:Automatic Keyphrase Extraction involves identifying essential phrases in a document. These keyphrases are crucial in various tasks such as document classification, clustering, recommendation, indexing, searching, summarization, and text simplification. This paper introduces a platform that integrates keyphrase datasets and facilitates the evaluation of keyphrase extraction algorithms. The platform includes BibRank, an automatic keyphrase extraction algorithm that leverages a rich dataset obtained by parsing bibliographic data in BibTeX format. BibRank combines innovative weighting techniques with positional, statistical, and word co-occurrence information to extract keyphrases from documents. The platform proves valuable for researchers and developers seeking to enhance their keyphrase extraction algorithms and advance the field of natural language processing.

Computation and Language

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the problem of Automatic Keyphrase Extraction (AKE), particularly in the context of scientific literature. Specifically, the paper proposes a platform called BibRank, which integrates a large number of keyphrase datasets and provides a method for evaluating the performance of keyphrase extraction algorithms. The main contributions of the paper include: 1. **BibRank Dataset**: Constructing an information-rich dataset by parsing publicly available bibliographic data in BibTeX format, which includes manually assigned keyphrases. 2. **BibRank Algorithm**: Introducing a new keyphrase extraction method—the BibRank algorithm, which leverages bibliographic information and statistical data from the BibRank dataset. 3. **BibRank Platform**: Providing a downloadable platform that integrates the BibRank dataset, the BibRank algorithm, and other state-of-the-art keyphrase extraction algorithms. The platform also includes evaluation metrics and allows for the integration of additional keyphrase extraction algorithms and datasets. 4. **Manual Evaluation of Keyphrases**: Using a gold standard dataset as a benchmark to evaluate keyphrase extraction algorithms and relying on expert human evaluators to judge the quality and effectiveness of these algorithms. To validate the effectiveness of the proposed algorithm and platform, the paper also conducts detailed experimental evaluations, including the calculation of standard performance metrics for the algorithms (such as recall, precision, and F1 score) and the evaluation of the gold standard dataset itself. These evaluations not only test the accuracy of the algorithms but also examine the quality of the existing gold standard datasets.

BibRank: Automatic Keyphrase Extraction Platform Using~Metadata

An Automatic Keyphrase Extraction System for Scientific Documents

Metadata Extraction for Scientific Papers

Enhancing keyphrase extraction from academic articles with their reference information

Automatic Keyphrase Extraction with A Refined Candidate Set

Keyphrases automatic extraction from the abstracts of English scientific papers based on Scopus retrieval

Learning to Extract Keyphrases from Text

MICRank: Multi-information interconstrained keyphrase extraction

Automatic Keyphrase Extraction by Bridging Vocabulary Gap.

A Review of Keyphrase Extraction

WordTopic-MultiRank: A New Method for Automatic Keyphrase Extraction.

Keyphrase extraction based on topic relevance and term association

Automatic Document Metadata Extraction Based on Deep Networks.

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

New Methods for Metadata Extraction from Scientific Literature

An efficient domain-independent approach for supervised keyphrase extraction and ranking

LongKey: Keyphrase Extraction for Long Documents

Automatic Keyphrase Extraction Via Topic Decomposition.

Keyphrase Generation Beyond the Boundaries of Title and Abstract

PTR: Phrase-Based Topical Ranking for Automatic Keyphrase Extraction in Scientific Publications

PKUSpace: A Collaborative Platform for Scientific Researching