A hybrid framework for product normalization n online shopping

Li Wang,Rong Zhang,Chaofeng Sha,Xiaofeng He,Aoying Zhou
DOI: https://doi.org/10.1007/978-3-642-37450-0_28
2013-01-01
Abstract:The explosive growth of products in both variety and quantity is an obvious evidence for the booming of C2C (Customer-to-Customer) E-commerce. Product normalization, which determines whether products are referring to the same underlying entity, is a fundamental task of data management in C2C market. However, product normalization in C2C market is challenging because the data is noisy and lacks a uniform schema. In this paper, we propose a hybrid framework, which achieves product normalization by the schema integration and data cleaning. In the framework, a graph-based method was proposed to integrate the schema. The missing data was filled and the incorrect data was repaired by using the evidence extracted from surrounding information, such as the title and textual description. We distinguish products by clustering on the product similarity matrix which is learned through logistic regression. We conduct experiments on the real-world data and the experimental results confirm the effectiveness of our design by comparing with the existing methods. © Springer-Verlag 2013.
What problem does this paper attempt to address?