Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Yilun Jin,Zheng Li,Chenwei Zhang,Tianyu Cao,Yifan Gao,Pratik Jayarao,Mao Li,Xin Liu,Ritesh Sarkhel,Xianfeng Tang,Haodong Wang,Zhengyang Wang,Wenju Xu,Jingfeng Yang,Qingyu Yin,Xian Li,Priyanka Nigam,Yi Xu,Kai Chen,Qiang Yang,Meng Jiang,Bing Yin
2024-10-31
Abstract:Online shopping is a complex multi-task, few-shot learning problem with a wide and evolving range of entities, relations, and tasks. However, existing models and benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping by alleviating task-specific engineering efforts and by providing users with interactive conversations. Despite the potential, LLMs face unique challenges in online shopping, such as domain-specific concepts, implicit knowledge, and heterogeneous user behaviors. Motivated by the potential and challenges, we propose Shopping MMLU, a diverse multi-task online shopping benchmark derived from real-world Amazon data. Shopping MMLU consists of 57 tasks covering 4 major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multi-linguality, and can thus comprehensively evaluate the abilities of LLMs as general shop assistants. With Shopping MMLU, we benchmark over 20 existing LLMs and uncover valuable insights about practices and prospects of building versatile LLM-based shop assistants. Shopping MMLU can be publicly accessed at <a class="link-external link-https" href="https://github.com/KL4805/ShoppingMMLU" rel="external noopener nofollow">this https URL</a>. In addition, with Shopping MMLU, we host a competition in KDD Cup 2024 with over 500 participating teams. The winning solutions and the associated workshop can be accessed at our website <a class="link-external link-https" href="https://amazon-kddcup24.github.io/" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced by online shopping as a complex multi - task and few - shot learning problem. Existing models and benchmarks are usually designed for specific tasks and cannot comprehensively capture the complexity of online shopping, especially in dealing with domain - specific concepts, implicit knowledge, heterogeneous user behaviors and multilingual tasks. Large language models (LLMs) are expected to profoundly change online shopping, reduce task - specific engineering efforts, and provide user - interactive conversations due to their multi - task and few - shot learning capabilities. However, LLMs face unique challenges in online shopping, such as domain - specific concept understanding, application of implicit knowledge, alignment of user behaviors, and improvement of multilingual capabilities. To solve these problems, the paper proposes Shopping MMLU, which is a diverse multi - task online shopping benchmark extracted from real - world Amazon data. Shopping MMLU contains 57 tasks, covering 4 main shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multilingual capabilities. These tasks can comprehensively evaluate the capabilities of LLMs as general shopping assistants. Through Shopping MMLU, the authors benchmarked more than 20 existing LLMs and revealed the practices and prospects of building a multi - functional LLM - based shopping assistant. The paper also points out that Shopping MMLU not only helps promote research in the field of online shopping, but also its findings may be helpful for building domain - specific LLMs in other user - oriented services, because these services also have similar multi - task and few - shot learning characteristics.