Reveal the Unknown: Out-of-Knowledge-Base Mention Discovery with Entity Linking

Hang Dong,Jiaoyan Chen,Yuan He,Yinan Liu,Ian Horrocks
DOI: https://doi.org/10.48550/arXiv.2302.07189
2023-02-14
Computation and Language
Abstract:Discovering entity mentions that are out of a Knowledge Base (KB) from texts plays a critical role in KB maintenance, but has not yet been fully explored. The current methods are mostly limited to the simple threshold-based approach and feature-based classification; the datasets for evaluation are relatively rare. In this work, we propose BLINKout, a new BERT-based Entity Linking (EL) method which can identify mentions that do not have a corresponding KB entity by matching them to a special NIL entity. To this end, we integrate novel techniques including NIL representation, NIL classification, and synonym enhancement. We also propose Ontology Pruning and Versioning strategies to construct out-of-KB mentions from normal, in-KB EL datasets. Results on four datasets of clinical notes and publications show that BLINKout outperforms existing methods to detect out-of-KB mentions for medical ontologies UMLS and SNOMED CT.
What problem does this paper attempt to address?