Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems

Lifu Huang,Jonathan May,Xiaoman Pan,Heng Ji,Xiang Ren,Jiawei Han,Lin Zhao,James A Hendler
DOI: https://doi.org/10.1089/big.2017.0012
IF: 4.426
Big Data
Abstract:The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.
What problem does this paper attempt to address?