Automatic Web News Extraction with Semantic Features

SHI Yang,ZHANG Qi,HUANG Xuan-jing
DOI: https://doi.org/10.3969/j.issn.1000-3428.2010.07.059
2010-01-01
Abstract:This paper analyzes the semantic features and the similarity of Web news pages,and presents an automatic Web news extraction method with semantic features.It utilizes semantic classifier to find the seed information,and uses portion features to build information extraction rules.The F1-Value of Web news extraction can reach to 94.2% when add semantic features to classifier.The performance of F1-Value can reach to 96.9% when combine semantic classifier and portion features based information extraction method.Experimental result shows that the method can effectively improve the accuracy of Web information extraction method and cut the cost of manual labeling work.
What problem does this paper attempt to address?