Automatic Extraction Of Commodity Attributes On Webpages Based On Hierarchical Structure

Zhi Yu,Meiyan Li,Wei Wang,Can Wang
DOI: https://doi.org/10.1142/9789814689007_0046
2015-01-01
Abstract:Everygood in e-commerce websites contains a lot of commodity attributes; even some of them are hidden behind the user's dynamic interaction. Finding them out for further analysis is a challenging topic. How to extract the commodity attributes effectively has been an attracting research topic in this area recently. Previous works cost too much calculation to deal this problem effectively, since they involve complex operations such as semantic analysis to extract hidden attributes. In this article, we analyze the relationship between different URL terms and commodity attributes, discuss with the hierarchical structure of the query results and propose a new Commodity Attributes Extraction algorithm based on Single-Layer method to find hidden attributes denoted by the URL terms. Using these URL terms, we can find out hidden attributes from a new query result page easily by analyzing its URL. Experiment result demonstrates the effectiveness of our algorithm.
What problem does this paper attempt to address?