News Information Extraction Based On Adaptive Weighting Using Unsupervised Bayesian Algorithm

Shilin Huang,Xiaolin Zheng,Xiaowei Wang,Deren Chen
DOI: https://doi.org/10.1007/978-3-642-23982-3_32
2011-01-01
Abstract:Information extraction is important in web information retrieval. In case of news information extraction, because news information does not have representative keywords pointing out its beginning and ending, it is difficult to specify the news title and body automatically. Our approach is based on an adaptive weighting factor using Bayesian algorithm to solve this problem. We divided a news page into text fragments, and represented them with a set of content features and layout features. We used an adaptive weighting factor to make features fit in different pages. Experiments show that our method results in a higher precision than the original algorithm without a weighting factor on the task of news information extraction.
What problem does this paper attempt to address?