Relation Links Extracted Approach Based on Blocking Links

WANG Fang,YU Hao,TAN Hong-ye,ZHAO Tie-jun
DOI: https://doi.org/10.3321/j.issn:1002-8331.2006.31.034
2006-01-01
Computer Engineering and Applications Journal
Abstract:There are lots of hyper links in a web page,including relation links andnoisy links.A novel approach is proposed to extract relation links from page based on link block in this paper.The approach is composed of two steps.Firstly,a web page is partitioned into lots of blocks according to HTML tag table in a web page.Then links are extracted from blocks and lots of link blocks are gotten.Secondly,relation link block is obtained by using rules.For instance,relation link belongs to one block and their anchor text has common words with title of current page where relation link is located.The result of experiment shows that the method is effective,with above 85% precise rate and about 70% recall rate.
What problem does this paper attempt to address?