Topic information extraction from Web pages based on tree comparison

Zhu Menglin,Li Guangyao,Zhou Yimin
DOI: https://doi.org/10.3969/j.issn.1674-7720.2011.19.024
2011-01-01
Abstract:In order to automatically extract Web page information from Internet that contains magnanimous information, this paper presented an approach based on tree comparison. This approach compared tree built from the target page with that ones built from its similar pages to simplify the target page. Extraction rules were generated on this basis, and then we used the rules to extract topic information from the target Web page. Experiment result shows this extraction method is precise and efficient.
What problem does this paper attempt to address?