Analysis and Improvement of Data Extraction Technology on the Web

Li Bi
DOI: https://doi.org/10.1109/ebiss.2010.5473712
2010-01-01
Abstract:The paper introduces an improved technology and infrastructure to support the effective flow of information among the sources and services on the Web and their interconnection with legacy systems that were designed to operate with traditional relational databases. This technology is designed to work as a relational front-end to semi-structured data sources. It extracts data from web pages using declarative specification files that define extraction rules expressed in regular expressions.
What problem does this paper attempt to address?