Abstract:Linking entities like people, organizations, books, music groups and their songs in text to knowledge bases (KBs) is a fundamental task for many downstream search and mining applications. Achieving high disambiguation accuracy crucially depends on a rich and holistic representation of the entities in the KB. For popular entities, such a representation can be easily mined from Wikipedia, and many current entity disambiguation and linking methods make use of this fact. However, Wikipedia does not contain long-tail entities that only few people are interested in, and also at times lags behind until newly emerging entities are added. For such entities, mining a suitable representation in a fully automated fashion is very difficult, resulting in poor linking accuracy. What can automatically be mined, though, is a high-quality representation given the context of a new entity occurring in any text. Due to the lack of knowledge about the entity, no method can retrieve these occurrences automatically with high precision, resulting in a chicken-egg problem. To address this, our approach automatically generates candidate occurrences of entities, prompting the user for feedback to decide if the occurrence refers to the actual entity in question. This feedback gradually improves the knowledge and allows our methods to provide better candidate suggestions to keep the user engaged. We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches. We conducted extensive experiments on the FACC dataset, showing that our approaches convincingly outperform carefully selected baselines in both intrinsic and extrinsic measures while keeping users engaged.

2ED: An Efficient Entity Extraction Algorithm Using Two-Level Edit-Distance

A Technical Report: Entity Extraction Using Both Character-based and Token-based Similarity

Efficient Approximate Entity Extraction with Edit Distance Constraints

An Easy-to-use Evaluation Framework for Benchmarking Entity Recognition and Disambiguation Systems.

Faerie: efficient filtering algorithms for approximate dictionary-based entity extraction.

An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints

Extending Dictionary-Based Entity Extraction to Tolerate Errors.

A Unified Framework for Approximate Dictionary-Based Entity Extraction.

Reserch of Entity Matching Based on Multiple Heterogenous Data

Boosting approximate dictionary-based entity extraction with synonyms

Entity Disambiguation via Fusion Entity Decoding

A Knowledge Graph Entity Disambiguation Method Based on Entity-Relationship Embedding and Graph Structure Embedding

Entity disambiguation with context awareness in user-generated short texts

Entity Extraction with Knowledge from Web Scale Corpora

Discovering Entities with Just a Little Help from You

A New Entity Extraction Method Based on Machine Reading Comprehension

Document-level Entity-based Extraction as Template Generation

EXACT: Attributed Entity Extraction By Annotating Texts

Crowd-Guided Entity Matching with Consolidated Textual Data

Graph-Based Jointly Modeling Entity Detection and Linking in Domain-Specific Area.

CTextEM: Using Consolidated Textual Data for Entity Matching