Multi-Strategy Combination Information Extraction Method

Ming-jun XIAO,Wei ZHANG,Xiang ZOU,Qing-Sheng CAI
DOI: https://doi.org/10.3969/j.issn.1000-1220.2005.04.013
2005-01-01
Abstract:A multi-strategy combination information extraction method,MSCIE(Multi-Strategy Combination Information Extraction),is introduced in this paper. MSCIE divided the information extraction from tabular web pages into the information extraction based on web page structure feature analysis and the information extraction based on pattern matching,also advanced a method of pruning the redundant information in the DOM(Document Object Model) trees of web pages and a feature pattern discovery algorithm which are used in the two information extraction method respectively, and accomplished the information extraction tasks by the two strategy cooperation. MSCIE, applied in the Competitive Intelligence System based on Internet, had extracted the supply and demand information of many products from a mass web sites, and achieved high precision and recall(95%on average).
What problem does this paper attempt to address?