Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Yilun Jin,Zheng Li,Chenwei Zhang,Tianyu Cao,Yifan Gao,Pratik Jayarao,Mao Li,Xin Liu,Ritesh Sarkhel,Xianfeng Tang,Haodong Wang,Zhengyang Wang,Wenju Xu,Jingfeng Yang,Qingyu Yin,Xian Li,Priyanka Nigam,Yi Xu,Kai Chen,Qiang Yang,Meng Jiang,Bing Yin

2024-10-31

Abstract:Online shopping is a complex multi-task, few-shot learning problem with a wide and evolving range of entities, relations, and tasks. However, existing models and benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping by alleviating task-specific engineering efforts and by providing users with interactive conversations. Despite the potential, LLMs face unique challenges in online shopping, such as domain-specific concepts, implicit knowledge, and heterogeneous user behaviors. Motivated by the potential and challenges, we propose Shopping MMLU, a diverse multi-task online shopping benchmark derived from real-world Amazon data. Shopping MMLU consists of 57 tasks covering 4 major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multi-linguality, and can thus comprehensively evaluate the abilities of LLMs as general shop assistants. With Shopping MMLU, we benchmark over 20 existing LLMs and uncover valuable insights about practices and prospects of building versatile LLM-based shop assistants. Shopping MMLU can be publicly accessed at <a class="link-external link-https" href="https://github.com/KL4805/ShoppingMMLU" rel="external noopener nofollow">this https URL</a>. In addition, with Shopping MMLU, we host a competition in KDD Cup 2024 with over 500 participating teams. The winning solutions and the associated workshop can be accessed at our website <a class="link-external link-https" href="https://amazon-kddcup24.github.io/" rel="external noopener nofollow">this https URL</a>.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced by online shopping as a complex multi - task and few - shot learning problem. Existing models and benchmarks are usually designed for specific tasks and cannot comprehensively capture the complexity of online shopping, especially in dealing with domain - specific concepts, implicit knowledge, heterogeneous user behaviors and multilingual tasks. Large language models (LLMs) are expected to profoundly change online shopping, reduce task - specific engineering efforts, and provide user - interactive conversations due to their multi - task and few - shot learning capabilities. However, LLMs face unique challenges in online shopping, such as domain - specific concept understanding, application of implicit knowledge, alignment of user behaviors, and improvement of multilingual capabilities. To solve these problems, the paper proposes Shopping MMLU, which is a diverse multi - task online shopping benchmark extracted from real - world Amazon data. Shopping MMLU contains 57 tasks, covering 4 main shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multilingual capabilities. These tasks can comprehensively evaluate the capabilities of LLMs as general shopping assistants. Through Shopping MMLU, the authors benchmarked more than 20 existing LLMs and revealed the practices and prospects of building a multi - functional LLM - based shopping assistant. The paper also points out that Shopping MMLU not only helps promote research in the field of online shopping, but also its findings may be helpful for building domain - specific LLMs in other user - oriented services, because these services also have similar multi - task and few - shot learning characteristics.

Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

LLaSA: Large Language and E-Commerce Shopping Assistant

Investigating LLM Applications in E-Commerce

LiLiuM: eBay's Large Language Models for e-commerce

Knowledge Graph Completion Models are Few-shot Learners: An Empirical Study of Relation Labeling in E-commerce with LLMs

Deep Cascade Multi-Task Learning for Slot Filling in Online Shopping Assistant

LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction

CMMLU: Measuring massive multitask language understanding in Chinese

A Survey on Benchmarks of Multimodal Large Language Models

Efficient Multimodal Large Language Models: A Survey

MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria

LLaMA-E: Empowering E-commerce Authoring with Object-Interleaved Instruction Following

Emerging Synergies Between Large Language Models and Machine Learning in Ecommerce Recommendations

Leveraging Large Language Models to Enhance Personalized Recommendations in E-commerce

IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce

Fine-tuning Multimodal Large Language Models for Product Bundling

A survey on fairness of large language models in e-commerce: progress, application, and challenge

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Winning Amazon KDD Cup'24

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

PUMGPT: A Large Vision-Language Model for Product Understanding