On web page extraction based on position of DIV

Liu Xunhua,Li Hui,Wu Dan,Huang Jiaqing,Wang Wei,Yu Li,Wu Ye,Xie Hengjun
DOI: https://doi.org/10.1109/ICCAE.2010.5451751
2010-01-01
Abstract:For the popular DIV page layout in Web Pages, this paper presents a method based on the position of DIV to extract main text from the body of Web pages by reconstructing, remaining atomic DIV and analyzing DIV position. Experiments showed that the accuracy rate of extraction can reach more than 90%, with a high versatility and accuracy. ©2010 IEEE.
What problem does this paper attempt to address?