Analysis and Comparison of Web Information Extraction Technologies

SONG Xinying,ZHAO Tiejun
2013-01-01
Abstract:The World Wide Web has become an important resource of information due to its explosive growth and spread in the past two decades.The tremendous amount of web data has opened a new era for data analysis and mining systems.More and more web applications need to extract,mine,and integrate data from enormous data sources.However,due to the semi-structure characteristic of web pages,web data exhibited on web pages is not directly consumable by machines.Web information extraction aims at extracting structured data from web pages,which is a very challenging problem due to the large-scale and highly-heterogeneous characteristic of web data.This paper introduces the state-of-the-art web information extraction studies,analyzes the advantages and limitations of each method,and conducts categorization and comparison of existing approaches.
What problem does this paper attempt to address?