Abstract:Click-through rate (CTR) prediction plays a critical role in recommender systems and web searches. While many existing methods utilize ensemble learning to improve model performance, they typically limit the ensemble to two or three sub-networks, with little exploration of larger ensembles. In this paper, we investigate larger ensemble networks and find three inherent limitations in commonly used ensemble learning method: (1) performance degradation with more networks; (2) sharp decline and high variance in sub-network performance; (3) large discrepancies between sub-network and ensemble predictions. To simultaneously address the above limitations, this paper investigates potential solutions from the perspectives of Knowledge Distillation (KD) and Deep Mutual Learning (DML). Based on the empirical performance of these methods, we combine them to propose a novel model-agnostic Ensemble Knowledge Transfer Framework (EKTF). Specifically, we employ the collective decision-making of the students as an abstract teacher to guide each student (sub-network) towards more effective learning. Additionally, we encourage mutual learning among students to enable knowledge acquisition from different views. To address the issue of balancing the loss hyperparameters, we design a novel examination mechanism to ensure tailored teaching from teacher-to-student and selective learning in peer-to-peer. Experimental results on five real-world datasets demonstrate the effectiveness and compatibility of EKTF. The code, running logs, and detailed hyperparameter configurations are available at: <a class="link-external link-https" href="https://github.com/salmon1802/EKTF" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to solve three main problems faced by ensemble learning methods in the click - through rate (CTR) prediction task: 1. **Performance degradation as the number of networks increases**: When the number of sub - networks in the ensemble increases, the overall model performance decreases instead. This is contrary to the phenomenon that in general, an increase in model parameters will lead to performance improvement. 2. **Sharp decline in sub - network performance and high variance**: As the number of sub - networks increases, the performance of each sub - network not only drops significantly, but also the performance differences between them become very large. 3. **Large differences between sub - network predictions and ensemble predictions**: Even for the best - performing sub - network, there is a significant gap between its prediction results and those of the ensemble model. This gap reduces the flexibility of the model in practical applications. To solve these problems, the author explored from the perspectives of knowledge distillation (KD) and deep mutual learning (DML), and proposed a new model - agnostic ensemble knowledge transfer framework (EKTF). Specifically: - **Knowledge distillation (KD)**: By using the collective decisions of multiple sub - networks as an abstract teacher to guide the learning of each sub - network, thereby providing an additional supervision signal. Experiments show that this method can effectively alleviate problems 1 and 2. - **Deep mutual learning (DML)**: Encourage sub - networks to learn from each other to promote knowledge acquisition from different perspectives. Although DML fails to solve the problem of performance degradation as the number of networks increases (problem 1), it further improves the performance of individual sub - networks, thus better solving problems 2 and 3. Finally, the author combined the advantages of KD and DML and designed a new loss - adaptive balancing mechanism (called the "examination mechanism") to ensure that teachers can provide customized teaching to students and students can selectively learn from each other. The experimental results show that the EKTF framework performs well on five real - world datasets, proving its effectiveness, compatibility and flexibility. In summary, this paper aims to solve the performance bottleneck and instability problems faced by ensemble learning in the CTR prediction task through innovative ensemble learning methods, thereby improving the overall performance and application flexibility of the model.

Ensemble Learning via Knowledge Transfer for CTR Prediction

Ensemble Knowledge Distillation for CTR Prediction

Enhanced Knowledge Transfer for Collaborative Filtering with Multi-Source Heterogeneous Feedbacks

Ensembled CTR Prediction Via Knowledge Distillation

A Collaborative Ensemble Framework for CTR Prediction

A Deeper Knowledge Tracking Model Integrating Cognitive Theory and Learning Behavior

Feature Interaction Fusion Self-Distillation Network For CTR Prediction

An ensemble learning framework for click-through rate prediction based on a reinforcement learning algorithm with parameterized actions

AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for Click-Through Rate Prediction

Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation

Improving Conversational Recommender System by Pretraining Billion-scale Knowledge Graph

Task Adaptive Multi-learner Network for Joint CTR and CVR Estimation.

Collaborative Topic Regression for Online Recommender Systems: an Online and Bayesian Approach

Efficient Transfer Learning Framework for Cross-Domain Click-Through Rate Prediction

Deep Time-Stream Framework for Click-Through Rate Prediction by Tracking Interest Evolution

TF4CTR: Twin Focus Framework for CTR Prediction via Adaptive Sample Differentiation

Continual Learning for CTR Prediction: A Hybrid Approach

Retrieval-Oriented Knowledge for Click-Through Rate Prediction

CETN: Contrast-enhanced Through Network for CTR Prediction

MISS: Multi-Interest Self-Supervised Learning Framework for Click-Through Rate Prediction

DCNv3: Towards Next Generation Deep Cross Network for CTR Prediction