Introducing a novel dataset for product matching: A new challenge for matching systems

Alexander Jesser,Tobias Rettenmeier
DOI: https://doi.org/10.1109/CECCC59577.2023.10560817
2023-10-27
Abstract:Finding and matching the same product on different online shops is an essential task in many e-commerce applications such as market analysis. Most often, due to a lack of a global product identifier, this task is non-trivial. This paper describes a novel dataset for product matching that can be used to train and evaluate machine learning models that automate this task. Existing public datasets are either too small to be used for deep learning applications, use data with a present global identifier to artificially create a dataset or are too easy for state-of-the-art models. The novelty of the Markt-Pilot dataset is, that it is substantially larger than existing datasets for product matching while also consisting of real-world data that has been retrieved by querying the search function of online shops through web scraping. The dirtiness of the data and the amount of edge cases present a greater challenge for product matching systems.
Computer Science,Business
What problem does this paper attempt to address?