Research On An Intelligent Semantic-Based Information Extraction Framework
Shuangyang Li,Zhengqiu Yang,Jiapeng Xiu,Chen Liu
Abstract:The research in the field of web information extraction technology has made some progress in the past few years, but the extraction performance of the system still needs a lot of improvement. To solve this problem, a new intelligent framework for web information extraction is proposed in this paper. The framework provides a mechanism for automatic generation of Web information extraction rules, and associate instances with user's customized semantic extraction requirement intelligently. The framework takes full use of the structured, hierarchical features of web design templates, which can convert the web into an XML document by crawling, purifying and processing, then the public extraction rules can be extracted based on XPath positioning information. In addition, the experimental results show that this method can extract the customized the web information accurately, quickly and efficiently.