Research on the Major Technologies of Fully Automatic Wrapper Generation for Web Information Extraction

MEI Xue,CHENG Xue-qi,GUO Yan,ZHANG Gang,DING Guo-dong
DOI: https://doi.org/10.16353/j.cnki.1000-7490.2010.01.005
2010-01-01
Abstract:There are many wrapper generation methods for Web information extraction. According to the automation degree,they can be divided into 3 categories:manual,semi-automatic and fully automatic. This paper aims to study the main technologies of fully automatic wrapper generation for Web information extraction. Firstly,a corresponding classification system is constructed. Secondly,15 major fully automatic wrapper generation technologies in recent years are analyzed qualitatively and compared according to classifications. Finally,5 development trends are summarized.
What problem does this paper attempt to address?