Abstract:Neural entity linking models are very powerful, but run the risk of overfitting to the domain they are trained in. For this problem, a domain is characterized not just by genre of text but even by factors as specific as the particular distribution of entities, as neural models tend to overfit by memorizing properties of frequent entities in a dataset. We tackle the problem of building robust entity linking models that generalize effectively and do not rely on labeled entity linking data with a specific entity distribution. Rather than predicting entities directly, our approach models fine-grained entity properties, which can help disambiguate between even closely related entities. We derive a large inventory of types (tens of thousands) from Wikipedia categories, and use hyperlinked mentions in Wikipedia to distantly label data and train an entity typing model. At test time, we classify a mention with this typing model and use soft type predictions to link the mention to the most similar candidate entity. We evaluate our entity linking system on the CoNLL-YAGO dataset (Hoffart et al., 2011) and show that our approach outperforms prior domain-independent entity linking systems. We also test our approach in a harder setting derived from the WikilinksNED dataset (Eshel et al., 2017) where all the mention-entity pairs are unseen during test time. Results indicate that our approach generalizes better than a state-of-the-art neural model on the dataset.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to construct an entity linking model that can generalize effectively and perform well in different domains. Specifically, although neural entity linking models are powerful, they are prone to over - fitting to the specific domain where the training data is located. This means that the performance of these models will decline significantly when encountering new domains with different training data distributions. To solve this problem, the author proposes a method based on fine - grained entity type prediction instead of directly predicting entities. This method trains the model by using category information in Wikipedia, thereby improving the generalization ability of the model. ### Main Contributions 1. **Redefine the entity linking problem as a pure entity type prediction problem**: Solve the cross - domain generalization problem by predicting the fine - grained type of the entity instead of directly predicting the entity. 2. **Construct a distantly - supervised type - prediction dataset based on Wikipedia categories and hyperlink data**: Use category information and hyperlink data in Wikipedia to train an ultra - fine - grained entity type model. 3. **Show that in the evaluation of two different domains, this model is more effective than other models trained from out - of - domain data**: In particular, when dealing with unseen entity linking pairs, this model shows better generalization ability. ### Method Overview - **Data Collection**: Collect sentences containing hyperlinks from Wikipedia and use these hyperlinks as distantly - supervised signals to generate a dataset for type prediction. - **Model Architecture**: - **Encoder**: Use ELMo and Bi - LSTM combined with an attention mechanism to encode mentions and context. - **Decoder**: Use a linear layer and a sigmoid function to predict the binary classification probability of each category. - **Entity Linking Prediction**: Use the trained type - prediction model to select the most similar entity by calculating the sum of the type probabilities of candidate entities. ### Experimental Results - **CoNLL - YAGO Dataset**: On the development set and the test set, this model achieved an accuracy of 88.1% and 85.9% respectively, significantly outperforming other baseline models. - **Unseen - Mentions Dataset**: On unseen entity linking pairs, this model also shows better generalization ability. ### Discussion - **Generalization Ability**: By predicting fine - grained types, the model can maintain high performance in different domains without failing due to over - fitting to the entity distribution in a specific domain. - **Flexibility**: This method only depends on Wikipedia data and does not require domain - specific labeled data, so it has high flexibility and extensibility. In conclusion, this paper successfully solves the problem of insufficient generalization ability of entity linking models in different domains by introducing the method of fine - grained type prediction, providing an effective solution for cross - domain entity linking.

Fine-Grained Entity Typing for Domain Independent Entity Linking

Boosting Collective Entity Linking via Type-Guided Semantic Embedding.

Entity Linking with people entity on Wikipedia

Neural Architectures for Fine-grained Entity Type Classification

EnCore: Fine-Grained Entity Typing by Pre-Training Entity Encoders on Coreference Chains

Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems

Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity Typing

Improving Entity Linking by Modeling Latent Entity Type Information

OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity Typing

Fine-Grained Entity Typing for Relation-Sparsity Entities

Attributed and Predictive Entity Embedding for Fine-Grained Entity Typing in Knowledge Bases.

Type-enriched Hierarchical Contrastive Strategy for Fine-Grained Entity Typing

A Coarse-to-fine Collective Entity Linking Method for Heterogeneous Information Networks.

Evaluating end-to-end entity linking on domain-specific knowledge bases: Learning about ancient technologies from museum collections

Learning Cross-Context Entity Representations from Text

Entity Linking Leveraging Automatically Generated Annotation.

Neural entity linking: A survey of models based on deep learning

Deep Neural Architectures for Joint Named Entity Recognition and Disambiguation

Large-scale neural biomedical entity linking with layer overwriting

LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty