Application and Design of Web Information Extraction System Based on Pattern Discovery

蔡霞,张森,周宇
DOI: https://doi.org/10.3969/j.issn.1671-7848.2003.03.011
2003-01-01
Abstract:It gives the rapid growth of public information sources on the World Wide Web, it is increasingly attractive to extract data from these sources. Current W eb sites present information on various topics in various formats. A great amoun t of effort is often required for a user to manually locate and extract useful d ata from the Web sites.A reference architecture based on pattern d iscovery is developed,which applies PAT trees to pattern discovery.The process r eq uires no human intervention and training example. Experimental result shows that it can achieve high extraction rate over popular search engines.
What problem does this paper attempt to address?