Dynamical Data Regions Identification and Extraction in Web Pages

HUANG Jianbin,JI Hongbing,SUN Heli
DOI: https://doi.org/10.3969/j.issn.1000-3428.2007.11.020
2007-01-01
Abstract:This paper presents an improved approach for finding data blocks in the HTML tag tree to mine the data regions embedded in a Web page.A policy of combining the Web page clustering and cross-page data region analysis is proposed to identify the dynamical Web data regions.Experimental results show the effectiveness of given approach.
What problem does this paper attempt to address?