Ensemble Learning via Knowledge Transfer for CTR Prediction

Honghao Li,Yiwen Zhang,Yi Zhang,Lei Sang
2024-11-25
Abstract:Click-through rate (CTR) prediction plays a critical role in recommender systems and web searches. While many existing methods utilize ensemble learning to improve model performance, they typically limit the ensemble to two or three sub-networks, with little exploration of larger ensembles. In this paper, we investigate larger ensemble networks and find three inherent limitations in commonly used ensemble learning method: (1) performance degradation with more networks; (2) sharp decline and high variance in sub-network performance; (3) large discrepancies between sub-network and ensemble predictions. To simultaneously address the above limitations, this paper investigates potential solutions from the perspectives of Knowledge Distillation (KD) and Deep Mutual Learning (DML). Based on the empirical performance of these methods, we combine them to propose a novel model-agnostic Ensemble Knowledge Transfer Framework (EKTF). Specifically, we employ the collective decision-making of the students as an abstract teacher to guide each student (sub-network) towards more effective learning. Additionally, we encourage mutual learning among students to enable knowledge acquisition from different views. To address the issue of balancing the loss hyperparameters, we design a novel examination mechanism to ensure tailored teaching from teacher-to-student and selective learning in peer-to-peer. Experimental results on five real-world datasets demonstrate the effectiveness and compatibility of EKTF. The code, running logs, and detailed hyperparameter configurations are available at: <a class="link-external link-https" href="https://github.com/salmon1802/EKTF" rel="external noopener nofollow">this https URL</a>.
Information Retrieval
What problem does this paper attempt to address?
This paper attempts to solve three main problems faced by ensemble learning methods in the click - through rate (CTR) prediction task: 1. **Performance degradation as the number of networks increases**: When the number of sub - networks in the ensemble increases, the overall model performance decreases instead. This is contrary to the phenomenon that in general, an increase in model parameters will lead to performance improvement. 2. **Sharp decline in sub - network performance and high variance**: As the number of sub - networks increases, the performance of each sub - network not only drops significantly, but also the performance differences between them become very large. 3. **Large differences between sub - network predictions and ensemble predictions**: Even for the best - performing sub - network, there is a significant gap between its prediction results and those of the ensemble model. This gap reduces the flexibility of the model in practical applications. To solve these problems, the author explored from the perspectives of knowledge distillation (KD) and deep mutual learning (DML), and proposed a new model - agnostic ensemble knowledge transfer framework (EKTF). Specifically: - **Knowledge distillation (KD)**: By using the collective decisions of multiple sub - networks as an abstract teacher to guide the learning of each sub - network, thereby providing an additional supervision signal. Experiments show that this method can effectively alleviate problems 1 and 2. - **Deep mutual learning (DML)**: Encourage sub - networks to learn from each other to promote knowledge acquisition from different perspectives. Although DML fails to solve the problem of performance degradation as the number of networks increases (problem 1), it further improves the performance of individual sub - networks, thus better solving problems 2 and 3. Finally, the author combined the advantages of KD and DML and designed a new loss - adaptive balancing mechanism (called the "examination mechanism") to ensure that teachers can provide customized teaching to students and students can selectively learn from each other. The experimental results show that the EKTF framework performs well on five real - world datasets, proving its effectiveness, compatibility and flexibility. In summary, this paper aims to solve the performance bottleneck and instability problems faced by ensemble learning in the CTR prediction task through innovative ensemble learning methods, thereby improving the overall performance and application flexibility of the model.