Web Content Extraction & Its Data Management Method

ZHANG Cheng-hong,XIAO Jun-jian,ZHANG Cheng
DOI: https://doi.org/10.3969/j.issn.0427-7104.2001.02.012
2001-01-01
Abstract:With the development of Internet and its relative technology, the WWW has become the largest information area. For the enterprise or the individual, Web becomes the main information source gradually. However, because of too many web sites and the information overflow resulting from this, it is more and more difficult to obtain useful information. Search engines only provide the scope of the searching information, and the concrete information must be looked up carefully by oneself. Because Web information is non-strutured or semi-structured, the analysis tool can't be used to analyze it directly. So it is necessary to advance a method of extracting the Web content automatically and structuring the Web data to simplify the process of obtaining information and facilitate the information analysis. This paper will describe this in detail.
What problem does this paper attempt to address?