Semantic Html Page Segmentation Using Type Analysis

Xin Yang,Peifeng Xiang,Yuanchun Shi
DOI: https://doi.org/10.1109/spca.2006.297506
2006-01-01
Abstract:Semantic information is necessary for Semantic Web processing and is useful to Web adaptation services such as personalization of users' browsing activities on small screen devices. However, semantic information is always implicitly encoded in most existing HTML documents. This paper describes a page segmentation method to parse Web pages into rectangular segments containing some semantic information, namely blocks. Existing page segmentation techniques are mainly built on HTML DOM structure or purely vision based, not accurate enough either in visual presentation or in semantic sense. Our approach is automatic, and based on a refined typing system which tightly couples type analysis with indispensable visual cues to generate blocks into the tree structure, aiming to achieve high degree of coherence in both semantic and visual views. Experimental results show better accuracy and completeness of our method over existing ones.
What problem does this paper attempt to address?