Extracting Novel Features for E-Commerce Page Quality Classification.
Jing Wang,Lanfen Lin,Feng Wang,Penghua Yu,Jiaolong Liu,Xiaowei Zhu
DOI: https://doi.org/10.1007/978-3-642-53914-5_41
2013-01-01
Abstract:There're a huge amount of web pages describing the same product on e-commerce websites, while their quality varies greatly. Therefore, there is a growing need for automated, accurate and efficient quality classification methods. Several link-based, click-based and content-based approaches have been proposed to evaluate the quality of pages for general search engines. However, these methods only consider the surface features of the html documents. What's more, features like link relations have drawbacks when dealing with e-commerce pages, because the hypothesis that links mean endorsements is not always right in the environment of e-commerce. In this paper, we propose two kinds of features that can directly indicate the quality of content. We analyze pages' content structure with a corpus of labeled texts, and evaluate the property completeness with the help of ontology. Then we combine these features with other commonly used features in literature. We apply several learning methods to train and classify pages into good and bad ones. Experiments on real e-commerce pages show that the proposed novel features can greatly improve the accuracy of classification. © Springer-Verlag 2013.