Knowledge graph construction from multiple online encyclopedias
Tianxing Wu,Haofen Wang,Cheng Li,Guilin Qi,Xing Niu,Meng Wang,Lin Li,Chaomin Shi
DOI: https://doi.org/10.1007/s11280-019-00719-4
2019-01-01
World Wide Web
Abstract:In recent years, lots of knowledge graphs built from Wikipedia, the largest multilingual online encyclopedia, have been published on the Web to support various applications. However, since non-English data in Wikipedia are sparse, some projects work on knowledge graph construction from multiple non-English online encyclopedias, but many technical details are missing, so it is hard to reuse their frameworks or techniques. In this paper, we propose a new framework to solve knowledge graph construction from multiple online encyclopedias. The core modules are knowledge extraction and knowledge linking. Knowledge extraction consists of regular extraction, i.e., extracting targeted article contents in the whole online encyclopedias periodically, and live extraction, which only extracts the article contents of new and updated entities. Knowledge linking utilizes heuristic lightweight entity matching strategies and a semi-supervised learning method to find duplicated entities and properties from different online encyclopedias. Experimental results show that our approaches for knowledge extraction and linking outperform state-of-the-art baselines in different evaluation metrics, and our framework can generate a large-scale knowledge graph after inputting multiple online encyclopedias.