Multi-Epoch Learning for Deep Click-Through Rate Prediction Models

Zhaocheng Liu,Zhongxiang Fan,Jian Liang,Dongying Kong,Han Li

2023-05-31

Abstract:The one-epoch overfitting phenomenon has been widely observed in industrial Click-Through Rate (CTR) applications, where the model performance experiences a significant degradation at the beginning of the second epoch. Recent advances try to understand the underlying factors behind this phenomenon through extensive experiments. However, it is still unknown whether a multi-epoch training paradigm could achieve better results, as the best performance is usually achieved by one-epoch training. In this paper, we hypothesize that the emergence of this phenomenon may be attributed to the susceptibility of the embedding layer to overfitting, which can stem from the high-dimensional sparsity of data. To maintain feature sparsity while simultaneously avoiding overfitting of embeddings, we propose a novel Multi-Epoch learning with Data Augmentation (MEDA), which can be directly applied to most deep CTR models. MEDA achieves data augmentation by reinitializing the embedding layer in each epoch, thereby avoiding embedding overfitting and simultaneously improving convergence. To our best knowledge, MEDA is the first multi-epoch training paradigm designed for deep CTR prediction models. We conduct extensive experiments on several public datasets, and the effectiveness of our proposed MEDA is fully verified. Notably, the results show that MEDA can significantly outperform the conventional one-epoch training. Besides, MEDA has exhibited significant benefits in a real-world scene on Kuaishou.

Information Retrieval,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the single - cycle over - fitting phenomenon that is widespread in click - through rate (CTR) prediction models. Specifically, in industrial applications, when the model training enters the second cycle, the performance will drop significantly. Although existing research has attempted to understand the factors behind this phenomenon through a large number of experiments, it is still unknown whether better results can be obtained through the multi - cycle training paradigm, because usually the best performance is achieved by single - cycle training. The paper hypothesizes that this phenomenon may be caused by the high sensitivity of the embedding layer to over - fitting, which may stem from the high - dimensional sparsity of the data. In order to avoid over - fitting of the embedding layer while maintaining the feature sparsity, the paper proposes a new multi - cycle learning and data augmentation method (MEDA), which can be directly applied to most deep CTR models. MEDA achieves data augmentation by re - initializing the embedding layer in each training cycle, thereby avoiding over - fitting of the embedding layer and simultaneously increasing the convergence speed of the MLP. The experimental results show that MEDA can significantly outperform the traditional single - cycle training and shows significant advantages in Kuaishou's actual scenarios.

Multi-Epoch Learning for Deep Click-Through Rate Prediction Models

Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction

Click-Through Rate Prediction Algorithm Based on Modeling of Implicit High-Order Feature Importance

Deep Time-Stream Framework for Click-Through Rate Prediction by Tracking Interest Evolution

MISS: Multi-Interest Self-Supervised Learning Framework for Click-Through Rate Prediction

Multi-scale and Multi-Channel Neural Network for Click-Through Rate Prediction.

Graph Relation Embedding Network for Click-Through Rate Prediction

MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction

Res-embedding for Deep Learning Based Click-Through Rate Prediction Modeling

Representation Learning-Assisted Click-Through Rate Prediction

Multi-view Click-through Rate Prediction Based on Multi-layer Deep Interest Network

A joint learning model for click-through prediction in display advertising

Efficient Transfer Learning Framework for Cross-Domain Click-Through Rate Prediction

Continual Learning for CTR Prediction: A Hybrid Approach

Learning Graph Meta Embeddings for Cold-Start Ads in Click-Through Rate Prediction

A Hierarchical Attention Model for CTR Prediction Based on User Interest

Cross Domain LifeLong Sequential Modeling for Online Click-Through Rate Prediction

Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction

An adaptive hybrid XdeepFM based deep Interest network model for click-through rate prediction system

Multi-Interactive Attention Network for Fine-grained Feature Learning in CTR Prediction