A METHOD FOR WEB INFORMATION EXTRACTION BASED ON MULTI-LEARNING STRATEGIES

Zhu Ming,Li Xiang,Zheng Quan
DOI: https://doi.org/10.3969/j.issn.1000-386X.2008.12.023
2008-01-01
Abstract:The current information extraction methods exist in the problem of poor applicability,since the content on the Internet are heterogeneous and dynamic.A method based on multi-learning strategies was proposed for Web information extraction(IE) by combining two types of algorithms based on conventional text classifier and Hidden Markov Models(HMM).The method can refine the IE result by using the relevant structural information present in the document,based on locally optimal classification of each fragment.Experiment result show that MLS method achieves higher accuracy and recall rate of IE without learning new Websites,and has strong applicability.
What problem does this paper attempt to address?