Abstract:Motivation: Biomedical entity linking (BEL) is the task of grounding entity mentions to a given knowledge base (KB). Recently, neural name-based methods, system identifying the most appropriate name in the KB for a given mention using neural network (either via dense retrieval or autoregressive modeling), achieved remarkable results for the task, without requiring manual tuning or definition of domain/entity-specific rules. However, as name-based methods directly return KB names, they cannot cope with homonyms, i.e. different KB entities sharing the exact same name. This significantly affects their performance for KBs where homonyms account for a large amount of entity mentions (e.g. UMLS and NCBI Gene). Results: We present BELHD (Biomedical Entity Linking with Homonym Disambiguation), a new name-based method that copes with this challenge. BELHD builds upon the BioSyn model with two crucial extensions. First, it performs pre-processing of the KB, during which it expands homonyms with a specifically constructed disambiguating string, thus enforcing unique linking decisions. Second, it introduces candidate sharing, a novel strategy that strengthens the overall training signal by including similar mentions from the same document as positive or negative examples, according to their corresponding KB identifier. Experiments with 10 corpora and 5 entity types show that BELHD improves upon current neural state-of-the-art approaches, achieving the best results in 6 out of 10 corpora with an average improvement of 4.55pp recall@1. Furthermore, the KB preprocessing is orthogonal to the prediction model and thus can also improve other neural methods, which we exemplify for GenBioEL, a generative name-based BEL approach. Availability and implementation: The code to reproduce our experiments can be found at: https://github.com/sg-wbi/belhd.

Multilingual End to End Entity Linking

Boosting Collective Entity Linking via Type-Guided Semantic Embedding.

Bilinear Joint Learning of Word and Entity Embeddings for Entity Linking.

Cross-Lingual Entity Matching for Heterogeneous Online Wikis.

Multilingual Entity Linking Using Dense Retrieval

Evaluating end-to-end entity linking on domain-specific knowledge bases: Learning about ancient technologies from museum collections

Multilingual bi‐encoder models for biomedical entity linking

BELB: a Biomedical Entity Linking Benchmark

Neural entity linking: A survey of models based on deep learning

Neural Cross-Lingual Entity Linking

Neural Cross-Lingual Coreference Resolution and its Application to Entity Linking

Multilingual Autoregressive Entity Linking

Xlink: An Unsupervised Bilingual Entity Linking System

MELO: An Evaluation Benchmark for Multilingual Entity Linking of Occupations

Entity Linking in the Job Market Domain

BELHD: Improving Biomedical Entity Linking with Homonoym Disambiguation

Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking

What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue

BELHD: improving biomedical entity linking with homonym disambiguation

Instructed Language Models with Retrievers Are Powerful Entity Linkers