Extracting Knowledge from Web Tables Based on DOM Tree Similarity.

Xiaolong Wu,Cungen Cao,Ya Wang,Jianhui Fu,Shi Wang
DOI: https://doi.org/10.1007/978-3-319-47650-6_24
2016-01-01
Abstract:Structured (semi-structured) knowledge extraction from Web tables is an important way to obtain high quality knowledge. Unlike most extraction methods which need to understand the tables with external knowledge bases, our method uses the inherent similarities of tables to determine the semantic structure of tables. With a comprehensive analysis of table structures of various forms, we provide a novel way for calculating the DOM tree similarity between various web tables based on DTW and for clustering tables. By using 5000 Wikipedia tables which were extracted at random as the corpus, experiments show that the result of table clustering is close to the result of classification based on empirical approaches, and without the use of external knowledge bases, the quality of knowledge extracted from the tables is satisfactory.
What problem does this paper attempt to address?