SG-WRAM Schema Guided Wrapper Maintenance: a Demonstration

XF Meng,HY Wang,DD Hu,MZ Gu
DOI: https://doi.org/10.1109/icde.2003.1260856
2003-01-01
Abstract:We propose a novel schema-guided approach for wrapper maintenance, called SG-WRAM. SG-WRAP can generate a wrapper to extract data from an HTML document to produce an XML document conforming to the user-defined schema. Based on these observations, we fulfill the maintenance following four sequential steps. At first, syntactic features, data pattern and notation are obtained from the schema, previous rule and extracted results, and then they are used to recognize the data items. After that, they are grouped according to the given schema. Each group is an instance of the given schema. At last, the representative instances are selected to re-induce the extraction rule. We name these four steps as features discovery, item recovery, block configuration and wrapper reparation respectively. The system to be demonstrated is implemented in Java. We also consider the major algorithms used in SG-WRAM.
What problem does this paper attempt to address?