Extracting Academic Information from Conference Web Pages

Peng Wang,Yue You,Baowen Xu,Jianyu Zhao
DOI: https://doi.org/10.1109/ICTAI.2011.164
2011-01-01
Abstract:Conference Web pages are the main platforms to share the conference information and organize conference events. To discover the academic knowledge from such Web pages for building academic ontologies or social networks, it is necessary to extract academic information from conference Web pages. This paper proposes an approach to extract academic information from conference Web pages. Firstly, Web pages are segmented into text blocks by analyzing the visual feature and DOM structure. Then Bayes Network is used to classify these text blocks into predefined categories, and the quality of initial classification results are improved after post-processing. Finally, the academic information is extracted from the classified text blocks. Our experimental results on the real world datasets show that the proposed method is highly effective and efficient for extracting academic information from conference Web pages, and it has average 90% precision and 89% recall.
What problem does this paper attempt to address?