Multi-Epoch Learning for Deep Click-Through Rate Prediction Models

Zhaocheng Liu,Zhongxiang Fan,Jian Liang,Dongying Kong,Han Li
2023-05-31
Abstract:The one-epoch overfitting phenomenon has been widely observed in industrial Click-Through Rate (CTR) applications, where the model performance experiences a significant degradation at the beginning of the second epoch. Recent advances try to understand the underlying factors behind this phenomenon through extensive experiments. However, it is still unknown whether a multi-epoch training paradigm could achieve better results, as the best performance is usually achieved by one-epoch training. In this paper, we hypothesize that the emergence of this phenomenon may be attributed to the susceptibility of the embedding layer to overfitting, which can stem from the high-dimensional sparsity of data. To maintain feature sparsity while simultaneously avoiding overfitting of embeddings, we propose a novel Multi-Epoch learning with Data Augmentation (MEDA), which can be directly applied to most deep CTR models. MEDA achieves data augmentation by reinitializing the embedding layer in each epoch, thereby avoiding embedding overfitting and simultaneously improving convergence. To our best knowledge, MEDA is the first multi-epoch training paradigm designed for deep CTR prediction models. We conduct extensive experiments on several public datasets, and the effectiveness of our proposed MEDA is fully verified. Notably, the results show that MEDA can significantly outperform the conventional one-epoch training. Besides, MEDA has exhibited significant benefits in a real-world scene on Kuaishou.
Information Retrieval,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the single - cycle over - fitting phenomenon that is widespread in click - through rate (CTR) prediction models. Specifically, in industrial applications, when the model training enters the second cycle, the performance will drop significantly. Although existing research has attempted to understand the factors behind this phenomenon through a large number of experiments, it is still unknown whether better results can be obtained through the multi - cycle training paradigm, because usually the best performance is achieved by single - cycle training. The paper hypothesizes that this phenomenon may be caused by the high sensitivity of the embedding layer to over - fitting, which may stem from the high - dimensional sparsity of the data. In order to avoid over - fitting of the embedding layer while maintaining the feature sparsity, the paper proposes a new multi - cycle learning and data augmentation method (MEDA), which can be directly applied to most deep CTR models. MEDA achieves data augmentation by re - initializing the embedding layer in each training cycle, thereby avoiding over - fitting of the embedding layer and simultaneously increasing the convergence speed of the MLP. The experimental results show that MEDA can significantly outperform the traditional single - cycle training and shows significant advantages in Kuaishou's actual scenarios.