Repeated Padding for Sequential Recommendation

Yizhou Dang,Yuting Liu,Enneng Yang,Guibing Guo,Linying Jiang,Xingwei Wang,Jianzhe Zhao
2024-07-30
Abstract:Sequential recommendation aims to provide users with personalized suggestions based on their historical interactions. When training sequential models, padding is a widely adopted technique for two main reasons: 1) The vast majority of models can only handle fixed-length sequences; 2) Batching-based training needs to ensure that the sequences in each batch have the same length. The special value \emph{0} is usually used as the padding content, which does not contain the actual information and is ignored in the model calculations. This common-sense padding strategy leads us to a problem that has never been explored before: \emph{Can we fully utilize this idle input space by padding other content to further improve model performance and training efficiency?} In this paper, we propose a simple yet effective padding method called \textbf{Rep}eated \textbf{Pad}ding (\textbf{RepPad}). Specifically, we use the original interaction sequences as the padding content and fill it to the padding positions during model training. This operation can be performed a finite number of times or repeated until the input sequences' length reaches the maximum limit. Our RepPad can be viewed as a sequence-level data augmentation strategy. Unlike most existing works, our method contains no trainable parameters or hyperparameters and is a plug-and-play data augmentation operation. Extensive experiments on various categories of sequential models and five real-world datasets demonstrate the effectiveness and efficiency of our approach. The average recommendation performance improvement is up to 60.3\% on GRU4Rec and 24.3\% on SASRec. We also provide in-depth analysis and explanation of what makes RepPad effective from multiple perspectives. Our datasets and codes are available at \url{<a class="link-external link-https" href="https://github.com/KingGugu/RepPad" rel="external noopener nofollow">this https URL</a>}.
Information Retrieval
What problem does this paper attempt to address?
The paper attempts to address the problem of how to utilize the idle input space in padding techniques to improve model performance and training efficiency in sequential recommendation. Specifically, traditional padding strategies typically use the special value 0 as padding content, which does not contain actual information and does not participate in model computation. The authors propose a simple and effective padding method—Repeated Padding (RepPad), which uses the original interaction sequence as padding content to fill the idle space in the input sequence, thereby achieving the effect of data augmentation. This method does not require additional training processes or tunable parameters and can be applied as a plug-and-play data augmentation operation to various sequential recommendation models. ### Main Contributions of the Paper: 1. **Proposed a new padding paradigm**: Called Repeated Padding (RepPad), this method does not contain any parameters and is a plug-and-play data augmentation method. 2. **Extensive experimental validation**: Comprehensive experiments were conducted on multiple datasets, different types of sequential models, and augmentation methods, proving the effectiveness and efficiency of RepPad. 3. **In-depth analysis**: Analyzed why RepPad is effective from the perspectives of loss convergence and gradient stability. ### Key Points of the Paper: - **Problem Background**: The task of sequential recommendation aims to predict the next user-item interaction based on the user's historical interaction sequence. When training sequential models, padding is a widely adopted technique mainly because most models can only handle fixed-length sequences, and batch training requires ensuring that the sequence lengths in each batch are the same. - **Limitations of Traditional Padding Strategies**: Traditional padding strategies use the special value 0 as padding content, which does not contain actual information and does not participate in model computation. This leads to a large amount of idle input space not being fully utilized. - **Core Idea of RepPad**: By repeatedly using the original interaction sequence as padding content, it fills the idle space in the input sequence, thereby achieving the effect of data augmentation. This method can be applied a limited number of times or repeated until the input sequence reaches the maximum length. - **Experimental Results**: Experimental results on multiple datasets show that RepPad can significantly improve the performance of recommendation models, especially on short sequence datasets. For long sequence datasets, the effect of RepPad may not be significant but it does not negatively impact performance. ### Summary: The paper proposes an innovative padding method—Repeated Padding (RepPad), which improves the performance and training efficiency of sequential recommendation models by utilizing idle input space. Experimental results show that RepPad performs excellently on multiple datasets and models, especially when dealing with short sequence datasets. This method provides a new approach to data augmentation in the field of sequential recommendation.