A Highly Adaptable Web Information Extractor Using Graph Data Model

Qi Guo,Lizhu Zhou,Zhiqiang Zhang,Jianhua Feng
DOI: https://doi.org/10.1007/978-3-540-24655-8_105
2004-01-01
Abstract:We present an approach to build highly adaptable extractor for collecting data from diverse Web sites. This approach uses Graph Model to represent content and structures as well as their various types of features. The generated graph is accompanied by a script in a special language called GQML containing the extraction rules. The running of the script transforms the graph into a specified format such as XML file that stores data from various Web sites in a uniform format. The experimental results show the presented approach is both effective and efficient.
What problem does this paper attempt to address?