A Hybrid Method for Web Data Extraction

Y Wang,LZ Zhou
DOI: https://doi.org/10.1109/wi.2003.1241229
2003-01-01
Abstract:Web data extraction refers to the technology that helps people find wanted information from the Web. In this paper, we first classify existing data extraction algorithms into two classes: Top-Down and Bottom-Up, and then analyze their strengths and weaknesses in terms of extraction accuracy. On the basis of this analysis, we present a hybrid algorithm: Bi-Direction Data Extraction (BiDDE for short), which takes the full strengths of both Top-Down and Bottom-Up algorithms and yet avoid their weaknesses. The experimental results show that BiDDE has not only higher accuracy than Top-Down algorithm and Bottom-Up algorithm, but satisfactory performance.
What problem does this paper attempt to address?