S2abEL: A Dataset for Entity Linking from Scientific Tables

Yuze Lou,Bailey Kuehl,Erin Bransom,Sergey Feldman,Aakanksha Naik,Doug Downey
2023-04-30
Abstract:Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward large-scale scientific knowledge bases that could enable advanced scientific question answering and analytics. We present the first dataset for EL in scientific tables. EL for scientific tables is especially challenging because scientific knowledge bases can be very incomplete, and disambiguating table mentions typically requires understanding the papers's tet in addition to the table. Our dataset, S2abEL, focuses on EL in machine learning results tables and includes hand-labeled cell types, attributed sources, and entity links from the PaperswithCode taxonomy for 8,429 cells from 732 tables. We introduce a neural baseline method designed for EL on scientific tables containing many out-of-knowledge-base mentions, and show that it significantly outperforms a state-of-the-art generic table EL method. The best baselines fall below human performance, and our analysis highlights avenues for improvement.
Computation and Language,Information Retrieval,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily addresses the issue of Entity Linking (EL) in scientific tables. Specifically: 1. **Lack of datasets**: - There is currently a lack of high-quality datasets for scientific tables, especially when dealing with a large number of out-of-KB mentions. 2. **Introduction of a new dataset**: - The S2abEL dataset is introduced, which is the first dataset specifically designed for entity linking in machine learning result tables. The dataset includes manually annotated cell types, attribution sources, and entity links related to the PaperswithCode taxonomy. 3. **Model challenges**: - Entity linking in the scientific domain is particularly challenging because scientific knowledge bases are often incomplete and require understanding of the paper text to assist in disambiguating entities in tables. 4. **Baseline model**: - A neural baseline method suitable for scientific tables is designed, which outperforms existing state-of-the-art general table entity linking methods in scenarios with many out-of-KB mentions. Through these contributions, the paper aims to advance the development of automated knowledge base construction in the scientific domain, thereby accelerating the resolution of complex problems and hypothesis generation in scientific research.