Extracting information from WEB tables based on abstract semantic model

Ning Gu,Guowen Wu,Xiaoyuan Wu,Baile Shi
2001-01-01
Ruan Jian Xue Bao/Journal of Software
Abstract:This paper presents a new method that extracts information from the tables of Web documents. Using tabled abstract semantic model to describe complicated tables and understand tables from the point of view of semantics, this method reduces the dependence on differences of the table design structure in the extraction process. At the same time it makes use of characteristics of HTML and techniques of natural language process, and designs some heuristics rules to aid the identification of table items. On the above basis, a prototype EXTable' is implemented and then a better result is obtained according to the experimentation.
What problem does this paper attempt to address?