Learning web page block functions using roles of images

Yang Xin,Shi Yuanchun
DOI: https://doi.org/10.1109/ICPCA.2008.4783565
2008-01-01
Abstract:Making use of block information in Web IR and Data Mining tasks calls for a good understanding of the function of each block. Existing works on classifying block functions and judging block importance have not made full use of the image factor, and only simple image features were considered. We regard image as a strong indicator of Web page blocks with various functions and propose to learn block functions using roles of images as part of block features. Blocks are generated from Web page segmentation and roles of images are automatically decided by image classification. We experiment on 140 Web pages and demonstrate that utilizing roles of images can significantly improve the classification quality of learning Web page block functions. We also measure the usefulness of diferent roles of images and evaluate the effect of two page segmentation methods. © 2008 IEEE.
What problem does this paper attempt to address?