Mining High-Quality Fine-Grained Type Information from Chinese Online Encyclopedias.

Maoxiang Hao,Zhixu Li,Yan Zhao,Kai Zheng
DOI: https://doi.org/10.1007/978-3-030-02925-8_25
2018-01-01
Abstract:Entity typing is a necessary step in building knowledge graphs. So far, plenty of efforts have been made in mining type information for entities from online encyclopedias, but usually only coarse-grained type information could be obtained for entities, which are not fine enough for the purpose of knowledge graphs construction or query answering. The situation becomes even worse for mining type information for entities in Chinese. In this paper, we work on mining high-quality fine-grained type information for entities from not only the title-labels and info-boxes in the entity’s encyclopedias page, but also the abstracts and crowd-labels in the page, which could provide a lot more candidate fine-grained type information (with noises). To maintain the high quality of the mined type information, initially we only get reliable type information from the title-labels and info-boxes. Then by putting entities, attributes, values and types into one graph, some path information can be obtained between each candidate entity-type pair, then we rely on a proposed Path-CNN binary classification model to identify more correct entity-type pairs from the graph. Compared with the previous approach and DBpedia, our work could mine a lot more high-quality fine-grained type information for entities from the online encyclopedia. By performing our approach on the largest Chinese online encyclopedia, Baidu Baike, we have generated 25,651,022 type information (with more than 80% accuracy) for the entities involved in this encyclopedia.
What problem does this paper attempt to address?