Research and Imolementation of Web Page Cleaning

周源远,王继成,郑刚,张福炎
DOI: https://doi.org/10.3969/j.issn.1000-3428.2002.09.019
2002-01-01
Abstract:The paper puts forward the concept of Web page cleaning and provides a rule-based method to distinguish the information blocks in Web pages. It also develops a system according to the method. The implementation is based on the DOM tree structure of the Web page, and the performance is evaluated by manual work. The result of evaluation shows that the method is practical, speedy and precise .
What problem does this paper attempt to address?