Building Web Page Logical Structure Model towards Effective Metadata Extraction

Baoyao Zhou,Ming Zhang
DOI: https://doi.org/10.1109/APWeb.2010.81
2010-01-01
Abstract:Web pages are typical semi-structure data. Some tree-based models have been proposed to describe the semantic content structure of web pages in order to facilitate further content analysis. However, most existing models only present the segmentation hierarchy of content blocks rather than the semantic relationships among them. In this work, we propose a novel web page semantic structure model, called Logical Structure Model. It can present more comprehensive structure information of web pages. Based on this model, the hidden patterns in web content can be revealed easier. The proposed model has been used to facilitate identifying course metadata in our Online Course Organization project, which aims to build an online course portal to serve the course information obtained from the Web.
What problem does this paper attempt to address?