Abstract:With the prosperity of e-commerce industry, various modalities, e.g., vision and language, are utilized to describe product items. It is an enormous challenge to understand such diversified data, especially via extracting the attribute-value pairs in text sequences with the aid of helpful image regions. Although a series of previous works have been dedicated to this task, there remain seldomly investigated obstacles that hinder further improvements: 1) Parameters from up-stream single-modal pretraining are inadequately applied, without proper jointly fine-tuning in a down-stream multi-modal task. 2) To select descriptive parts of images, a simple late fusion is widely applied, regardless of priori knowledge that language-related information should be encoded into a common linguistic embedding space by stronger encoders. 3) Due to diversity across products, their attribute sets tend to vary greatly, but current approaches predict with an unnecessary maximal range and lead to more potential false positives. To address these issues, we propose in this paper a novel approach to boost multi-modal e-commerce attribute value extraction via unified learning scheme and dynamic range minimization: 1) Firstly, a unified scheme is designed to jointly train a multi-modal task with pretrained single-modal parameters. 2) Secondly, a text-guided information range minimization method is proposed to adaptively encode descriptive parts of each modality into an identical space with a powerful pretrained linguistic model. 3) Moreover, a prototype-guided attribute range minimization method is proposed to first determine the proper attribute set of the current product, and then select prototypes to guide the prediction of the chosen attributes. Experiments on the popular multi-modal e-commerce benchmarks show that our approach achieves superior performance over the other state-of-the-art techniques.

Scaling Up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title.

Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content

Automatic Extraction Of Commodity Attributes On Webpages Based On Hierarchical Structure

Large Scale Generative Multimodal Attribute Extraction for E-commerce Attributes

Enhanced E-Commerce Attribute Extraction: Innovating with Decorative Relation Correction and LLAMA 2.0-Based Annotation

AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding

OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

Towards Open-World Product Attribute Mining: A Lightly-Supervised Approach

Attribute Extraction from Product Titles in eCommerce

Exploring Generative Models for Joint Attribute Value Extraction from Product Titles

Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product

PAM: Understanding Product Images in Cross Product Category Attribute Extraction

Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization

A Multi-task Learning Approach for Improving Product Title Compression with User Search Log Data

AE-smnsMLC: Multi-Label Classification with Semantic Matching and Negative Label Sampling for Product Attribute Value Extraction

A Framework for Leveraging Partially-Labeled Data for Product Attribute-Value Identification

A novel feature integration method for named entity recognition model in product titles

AtTGen: Attribute Tree Generation for Real-World Attribute Joint Extraction

Octet: Online Catalog Taxonomy Enrichment with Self-Supervision

Research on Product Ontology Construction in E-commerce Environment

Bridging the Semantic Gap Between Image Contents and Tags