Image Score: Learning and Evaluating Human Preferences for Mercari Search

Chingis Oinar,Miao Cao,Shanshan Fu
2024-08-21
Abstract:Mercari is the largest C2C e-commerce marketplace in Japan, having more than 20 million active monthly users. Search being the fundamental way to discover desired items, we have always had a substantial amount of data with implicit feedback. Although we actively take advantage of that to provide the best service for our users, the correlation of implicit feedback for such tasks as image quality assessment is not trivial. Many traditional lines of research in Machine Learning (ML) are similarly motivated by the insatiable appetite of Deep Learning (DL) models for well-labelled training data. Weak supervision is about leveraging higher-level and/or noisier supervision over unlabeled data. Large Language Models (LLMs) are being actively studied and used for data labelling tasks. We present how we leverage a Chain-of-Thought (CoT) to enable LLM to produce image aesthetics labels that correlate well with human behavior in e-commerce settings. Leveraging LLMs is more cost-effective compared to explicit human judgment, while significantly improving the explainability of deep image quality evaluation which is highly important for customer journey optimization at Mercari. We propose a cost-efficient LLM-driven approach for assessing and predicting image quality in e-commerce settings, which is very convenient for proof-of-concept testing. We show that our LLM-produced labels correlate with user behavior on Mercari. Finally, we show our results from an online experimentation, where we achieved a significant growth in sales on the web platform.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: in C2C e - commerce platforms (such as Mercari), how to efficiently and cost - effectively evaluate and predict the quality of product images, and study the impact of these image qualities on user behavior. Specifically, the authors focus on the following points: 1. **The relationship between implicit feedback and image quality**: - In traditional image quality assessment tasks, a large amount of manually - annotated data is usually required, which is both time - consuming and expensive. This paper proposes a method of generating labels based on large - language models (LLMs) to reduce the dependence on explicit human annotations. - By using implicit feedback (such as users' clicking behavior), the authors hope to reveal the specific impact of image quality on user behavior. 2. **Improving the search experience**: - Mercari is a two - way online marketplace where users can buy and sell various new and old products. Buyers usually find products through search queries, and during the browsing process, the image is one of the first pieces of information that users see. Therefore, high - quality images are crucial for attracting users' attention. - The authors hope to optimize users' search experience by improving the image quality assessment method, thereby increasing users' participation and purchase rate. 3. **Cost - effectiveness and technical feasibility**: - A method of image quality assessment based on LLMs is proposed. This method can not only significantly reduce costs but also use clicking behavior as a key indicator in offline model evaluation to verify the effectiveness of the model. - Through online experiments, the feasibility and superiority of this method in practical applications are proved, especially achieving a significant increase in sales volume (almost 7% increase) on the Web platform. ### Formula summary - **Loss function**: \[ L(x_i, x_j)=\sum_{i = 1}^{N}\sum_{j = 1}^{N}I[y_i>y_j]\cdot F_L([f_\theta(x_i), f_\theta(x_j)], y = 0) \] where \( f_\theta(x_i) \) and \( f_\theta(x_j) \) are the scores \( s_i \) and \( s_j \) predicted by MLP, and \( F_L \) represents Focal Loss. - **Ranking formula**: - **Multiplication method**: \[ \text{final_score}=\text{existing_relevance_score}\times\text{image_score} \] - **Addition method**: \[ \text{final_score}=\text{existing_relevance_score}+\text{image_score} \] ### Key contributions 1. **Data generation method**: A method of decoupling implicit feedback and relevance is proposed, highlighting the contribution of image quality to users' clicks. 2. **The correlation between LLM - generated labels and user behavior**: The correlation between the image quality labels generated by LLMs and users' behavior on Mercari is demonstrated. 3. **Performance improvement**: The proposed Image Score model is significantly superior to baseline models based on relevance and popularity, such as CLIP scores and historical click - through rates (CTR), in click prediction, and has achieved a significant increase in sales volume on the Web platform. These contributions provide valuable references for designing an efficient and low - cost image quality assessment pipeline and analyze the correlation between image quality and implicit feedback.