Scaling Law of Large Sequential Recommendation Models

Gaowei Zhang,Yupeng Hou,Hongyu Lu,Yu Chen,Wayne Xin Zhao,Ji-Rong Wen

2023-11-19

Abstract:Scaling of neural networks has recently shown great potential to improve the model capacity in various fields. Specifically, model performance has a power-law relationship with model size or data size, which provides important guidance for the development of large-scale models. However, there is still limited understanding on the scaling effect of user behavior models in recommender systems, where the unique data characteristics (e.g. data scarcity and sparsity) pose new challenges to explore the scaling effect in recommendation tasks. In this work, we focus on investigating the scaling laws in large sequential recommendation models. Specially, we consider a pure ID-based task formulation, where the interaction history of a user is formatted as a chronological sequence of item IDs. We don't incorporate any side information (e.g. item text), because we would like to explore how scaling law holds from the perspective of user behavior. With specially improved strategies, we scale up the model size to 0.8B parameters, making it feasible to explore the scaling effect in a diverse range of model sizes. As the major findings, we empirically show that scaling law still holds for these trained models, even in data-constrained scenarios. We then fit the curve for scaling law, and successfully predict the test loss of the two largest tested model scales. Furthermore, we examine the performance advantage of scaling effect on five challenging recommendation tasks, considering the unique issues (e.g. cold start, robustness, long-term preference) in recommender systems. We find that scaling up the model size can greatly boost the performance on these challenging tasks, which again verifies the benefits of large recommendation models.

Information Retrieval

What problem does this paper attempt to address?

The main problem this paper attempts to address is the exploration of scaling laws in large-scale sequential recommendation models. Specifically, the paper focuses on the following points: 1. **Exploring the applicability of scaling laws in recommendation systems**: Although power-law relationships between model performance and model size or data size exist in fields such as Natural Language Processing (NLP) and Computer Vision (CV), the understanding of scaling effects in recommendation systems, particularly in user behavior models, remains limited. The paper aims to investigate whether these scaling laws apply to large-scale sequential recommendation models. 2. **Addressing data sparsity and noise in recommendation systems**: Interaction data in recommendation systems is typically highly sparse and noisy, posing new challenges for exploring scaling effects. The paper experimentally verifies that scaling laws remain effective even under data-limited conditions. 3. **Evaluating the performance of large-scale models in complex recommendation tasks**: The paper designs 5 challenging recommendation task settings, including long-tail item recommendation, cold-start user recommendation, multi-domain transfer, robustness challenges, and long-term trajectory prediction, to assess the performance advantages of large-scale models in these tasks. 4. **Developing stable training strategies**: Due to the instability that large-scale Transformer models can exhibit during training, the paper proposes several improved training strategies, such as layer-adaptive Dropout and switching optimizer strategies, to achieve more stable training. Through these studies, the paper hopes to provide important guidance and theoretical basis for the design and optimization of large-scale recommendation models.

Scaling Law of Large Sequential Recommendation Models

Scaling New Frontiers: Insights into Large Recommendation Models

Understanding Scaling Laws for Recommendation Models

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Scaling Graph Neural Networks for Large-Scale Power Systems Analysis: Empirical Laws for Emergent Abilities

Scaling Laws For Dense Retrieval

Scaling Sequential Recommendation Models with Transformers

On the Embedding Collapse when Scaling up Recommendation Models

Scaling Law for Time Series Forecasting

Towards Neural Scaling Laws on Graphs

Unified Neural Network Scaling Laws and Scale-time Equivalence

Explaining Neural Scaling Laws

A Dynamical Model of Neural Scaling Laws

Linear-Time Graph Neural Networks for Scalable Recommendations

Scaling Laws for the Value of Individual Data Points in Machine Learning

A Hitchhiker's Guide to Scaling Law Estimation

A Solvable Model of Neural Scaling Laws

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

A Resource Model For Neural Scaling Law

Towards Neural Scaling Laws for Time Series Foundation Models

Scaling Laws in Linear Regression: Compute, Parameters, and Data