Automatic Web Information Extraction Based on Repetitive Pattern

胡仁龙,袁春风,武港山,濮小佳
DOI: https://doi.org/10.3969/j.issn.1000-3428.2008.22.025
2008-01-01
Abstract:There are many on-line shopping Web sites on WWW,and commodity information in these Web pages can be extracted for E-commerce and Web-query.This paper presents an automated approach for Web information extraction against these Web sites.The approach finds the topic area by detecting repetitive patterns and analyzing the characteristics of topic area in a single Web page.There are no human interactions during extraction.The approach tests 10 on-line shopping sites and experimental results show that the approach is effective.
What problem does this paper attempt to address?