Mining RDF from Tables in Chinese Encyclopedias
Weiming Lu,Zhenyu Zhang,Renjie Lou,Hao Dai,Shansong Yang,Baogang Wei
DOI: https://doi.org/10.1007/978-3-319-25207-0_24
2015-01-01
Abstract:Web tables understanding has recently attracted a number of studies. However, many works focus on the tables in English, because they usually need the help of knowledge bases, while the existing knowledge bases such as DBpedia, YAGO, Freebase and Probase mainly contain knowledge in English. In this paper, we focus on the RDF triples extraction from tables in Chinese encyclopedias. Firstly, we constructed a Chinese knowledge base through taxonomy mining and class attribute mining. Then, with the help of our knowledge base, we extracted triples from tables through column scoring, table classification and RDF extraction. In our experiments, we practically implemented our approach in 6,618,544 articles from Hudong Baike with 764,292 tables, and extracted about 1,053,407 unique and new RDF triples with an estimated accuracy of $$90.2\\%$$, which outperforms other similar works.