Abstract:Mercari is the largest C2C e-commerce marketplace in Japan, having more than 20 million active monthly users. Search being the fundamental way to discover desired items, we have always had a substantial amount of data with implicit feedback. Although we actively take advantage of that to provide the best service for our users, the correlation of implicit feedback for such tasks as image quality assessment is not trivial. Many traditional lines of research in Machine Learning (ML) are similarly motivated by the insatiable appetite of Deep Learning (DL) models for well-labelled training data. Weak supervision is about leveraging higher-level and/or noisier supervision over unlabeled data. Large Language Models (LLMs) are being actively studied and used for data labelling tasks. We present how we leverage a Chain-of-Thought (CoT) to enable LLM to produce image aesthetics labels that correlate well with human behavior in e-commerce settings. Leveraging LLMs is more cost-effective compared to explicit human judgment, while significantly improving the explainability of deep image quality evaluation which is highly important for customer journey optimization at Mercari. We propose a cost-efficient LLM-driven approach for assessing and predicting image quality in e-commerce settings, which is very convenient for proof-of-concept testing. We show that our LLM-produced labels correlate with user behavior on Mercari. Finally, we show our results from an online experimentation, where we achieved a significant growth in sales on the web platform.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: in C2C e - commerce platforms (such as Mercari), how to efficiently and cost - effectively evaluate and predict the quality of product images, and study the impact of these image qualities on user behavior. Specifically, the authors focus on the following points: 1. **The relationship between implicit feedback and image quality**: - In traditional image quality assessment tasks, a large amount of manually - annotated data is usually required, which is both time - consuming and expensive. This paper proposes a method of generating labels based on large - language models (LLMs) to reduce the dependence on explicit human annotations. - By using implicit feedback (such as users' clicking behavior), the authors hope to reveal the specific impact of image quality on user behavior. 2. **Improving the search experience**: - Mercari is a two - way online marketplace where users can buy and sell various new and old products. Buyers usually find products through search queries, and during the browsing process, the image is one of the first pieces of information that users see. Therefore, high - quality images are crucial for attracting users' attention. - The authors hope to optimize users' search experience by improving the image quality assessment method, thereby increasing users' participation and purchase rate. 3. **Cost - effectiveness and technical feasibility**: - A method of image quality assessment based on LLMs is proposed. This method can not only significantly reduce costs but also use clicking behavior as a key indicator in offline model evaluation to verify the effectiveness of the model. - Through online experiments, the feasibility and superiority of this method in practical applications are proved, especially achieving a significant increase in sales volume (almost 7% increase) on the Web platform. ### Formula summary - **Loss function**: \[ L(x_i, x_j)=\sum_{i = 1}^{N}\sum_{j = 1}^{N}I[y_i>y_j]\cdot F_L([f_\theta(x_i), f_\theta(x_j)], y = 0) \] where \( f_\theta(x_i) \) and \( f_\theta(x_j) \) are the scores \( s_i \) and \( s_j \) predicted by MLP, and \( F_L \) represents Focal Loss. - **Ranking formula**: - **Multiplication method**: \[ \text{final_score}=\text{existing_relevance_score}\times\text{image_score} \] - **Addition method**: \[ \text{final_score}=\text{existing_relevance_score}+\text{image_score} \] ### Key contributions 1. **Data generation method**: A method of decoupling implicit feedback and relevance is proposed, highlighting the contribution of image quality to users' clicks. 2. **The correlation between LLM - generated labels and user behavior**: The correlation between the image quality labels generated by LLMs and users' behavior on Mercari is demonstrated. 3. **Performance improvement**: The proposed Image Score model is significantly superior to baseline models based on relevance and popularity, such as CLIP scores and historical click - through rates (CTR), in click prediction, and has achieved a significant increase in sales volume on the Web platform. These contributions provide valuable references for designing an efficient and low - cost image quality assessment pipeline and analyze the correlation between image quality and implicit feedback.

Image Score: Learning and Evaluating Human Preferences for Mercari Search

Online Metric Learning for Relevance Feedback in E-Commerce Image Retrieval Applications

Understanding Image Quality and Trust in Peer-to-Peer Marketplaces

Community-Aware Photo Quality Evaluation by Deeply Encoding Human Perception

Demand Analytics in E-Commerce Leveraging Computer Vision Algorithms

What Image Do You Need? A Two-stage Framework for Image Selection in E-commerce.

VSEM-SAMMI: An Explainable Multimodal Learning Approach to Predict User-Generated Image Helpfulness and Product Sales

Leveraging Large Language Models to Enhance Personalized Recommendations in E-commerce

Images Don't Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank

Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Investigating LLM Applications in E-Commerce

COURIER: Contrastive User Intention Reconstruction for Large-Scale Visual Recommendation

When relevance is not Enough: Promoting Visual Attractiveness for Fashion E-commerce

Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study

LLMs in e-commerce: A comparative analysis of GPT and LLaMA models in product review evaluation

Can users embed their user experience in user-generated images? Evidence from JD.com

Multimodal Deep Learning of Word-of-Mouth Text and Demographics to Predict Customer Rating: Handling Consumer Heterogeneity in Marketing

Knowledge Graph Completion Models are Few-shot Learners: An Empirical Study of Relation Labeling in E-commerce with LLMs

LALDM: A Multimodal Aspect Level Text Analysis Method and Its Application in Online Consumer Electronics

Considering User Agreement in Learning to Predict the Aesthetic Quality

Balancing Efficiency and Effectiveness: An LLM-Infused Approach for Optimized CTR Prediction