Entity Column Discovery Algorithm of Web Table

Lifang ZHANG,Ning WANG,Fei QI
DOI: https://doi.org/10.3969/j.issn.1000-3428.2017.12.031
2018-01-01
Abstract:Semantic information for Web tables is not understood by machines.Traditional entity column detection methods find entity columns with header information and knowledge base.They are not applicable for tables without headers.This paper proposes an entity column discovery algorithm of Web table based on column value of approximate functional dependencies and normalization,which is used to annotate entity column for tables that have no header or cannot restore a full header even multiple entity column tables.The approximate function dependency relations between Web table attributes are detected according to attribute values in Web tables.The noisy function dependency relations are filtered according to the characteristics of Web tables.The entity columns of the Web table are obtained by normalization of the function dependency set.Compared with entity column detection algorithm based on knowledge base,the proposed algorithm is independent of header information,3% ~ 5% higher in precision and recall,and can be applied in more scenes.
What problem does this paper attempt to address?