PAE: LLM-based Product Attribute Extraction for E-Commerce Fashion Trends

Apurva Sinha,Ekta Gujral
2024-05-28
Abstract:Product attribute extraction is an growing field in e-commerce business, with several applications including product ranking, product recommendation, future assortment planning and improving online shopping customer experiences. Understanding the customer needs is critical part of online business, specifically fashion products. Retailers uses assortment planning to determine the mix of products to offer in each store and channel, stay responsive to market dynamics and to manage inventory and catalogs. The goal is to offer the right styles, in the right sizes and colors, through the right channels. When shoppers find products that meet their needs and desires, they are more likely to return for future purchases, fostering customer loyalty. Product attributes are a key factor in assortment planning. In this paper we present PAE, a product attribute extraction algorithm for future trend reports consisting text and images in PDF format. Most existing methods focus on attribute extraction from titles or product descriptions or utilize visual information from existing product images. Compared to the prior works, our work focuses on attribute extraction from PDF files where upcoming fashion trends are explained. This work proposes a more comprehensive framework that fully utilizes the different modalities for attribute extraction and help retailers to plan the assortment in advance. Our contributions are three-fold: (a) We develop PAE, an efficient framework to extract attributes from unstructured data (text and images); (b) We provide catalog matching methodology based on BERT representations to discover the existing attributes using upcoming attribute values; (c) We conduct extensive experiments with several baselines and show that PAE is an effective, flexible and on par or superior (avg 92.5% F1-Score) framework to existing state-of-the-art for attribute value extraction task.
Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper proposes a solution to the problem of product attribute extraction in e-commerce, specifically focusing on text and image data in fashion trend reports. Existing methods mostly focus on extracting attributes from titles or product descriptions, or using visual information from existing product images. This work, however, focuses on extracting attributes from PDF files that contain future fashion trend explanations, which helps retailers plan their product assortments in advance. The main contributions of the paper include: 1. The proposal of an algorithm framework named PAE (Product Attribute Extraction), which efficiently extracts attributes from unstructured text and image data. 2. The introduction of a catalog matching approach based on BERT representation to discover existing attributes and use upcoming attribute values. 3. Extensive experiments were conducted, comparing with multiple baseline models, proving that PAE performs as well as or even better than state-of-the-art methods in attribute value extraction tasks, with an average F1 score of 92.5%. The paper discusses the challenges of extracting text and images from PDF files, such as text misspelling, loss of image quality, and multi-label attribute recognition, and proposes corresponding solutions. By matching the extracted attributes with the product catalog, the quality of search tags can be improved, leading to an enhanced shopping experience for customers. In addition, the paper discusses how to utilize unsupervised models and extract interpretable visual attributes from unlabeled data. The experimental results show that PAE achieves high precision on multiple datasets, with F1 scores exceeding 90% for both text and images, demonstrating its effectiveness and flexibility.