Visual Features and Domain Ontology-Based Web Information Extraction

ZHANG Xin,CHEN Mei,WANG Han-hu,WANG Yan-ran
DOI: https://doi.org/10.3969/j.issn.1673-629x.2011.02.015
2011-01-01
Abstract:Put forward a Web information extraction algorithm based on visual features and domain ontology in order to solve the problem of Web information automatic extraction.This algorithm is on base of domain ontology-based Web page information extraction,according to the visual characteristics of the sample Web page to accurately delineated the area of information extraction,and get the Web page information item extraction path by combining DOM tree technology and extraction path heuristic learning.Through the domain ontology which is automatically generated by the extraction path,get the extraction rules of the information items.Using this algorithm for Web information extraction has many advantages,such as higher recall and precision rate,lower time complexity,lighter user burden and higher degree of automation.
What problem does this paper attempt to address?