Algorithm for Webpage Semantic Blocks Mining Using Tree Match Method

LIU Shou-qun,ZHU Ming,TAN Xiao-bin
2009-01-01
Abstract:In the WWW,many web documents are combined with various semantic regions.Discovery and mining such regions has a significant effort for web page analysis,user browser experience improvement,etc.But because of the difference of web page structure and content among large amounts of web pages,it is hard to detect such common regions effectively and correctly,traditional matching methods such as regular expression are not suitable for this problem.This paper proposes a region detection method based on tree match algorithm.As is shown according the experiments,the method this paper described improves F-Measure value,besides this method also reduces computation cost.
What problem does this paper attempt to address?