SS4CTR: a Semi-Supervised Framework for Enhancing Click-Through Rate Prediction in Sparse and Imbalanced Data

Junming Zhou,Chao Chang,Weisheng Li,Ronghua Lin,Zhengyang Wu,Yong Tang
DOI: https://doi.org/10.1007/s11280-024-01310-2
2024-01-01
World Wide Web
Abstract:Click-Through Rate (CTR) prediction, which estimates the probability of a user clicking on a particular item, constitutes a pivotal element in the realms of both online advertising and recommender systems. However, issues surrounding sparse and imbalanced data have yet to be resolved. To cope with these challenges, this paper proposes a semi-supervised framework called SS4CTR. Two distinctive features characterise the proposed SS4CTR model. Firstly, it employs an interpretable approach to select negative samples based on the global popularity of items, ensuring a balanced ratio of positive and negative samples within the input dataset. Secondly, by integrating both labeled and unlabeled data into the training process, the model effectively tackles the challenge of data sparsity and significantly enhances the accuracy of user click-through rate predictions. And the confidence threshold mechanism for pseudo-labelling also ensures that unlabeled data can be used in a secure manner. To the best of our knowledge, this is the first study to address the key challenges posed by sparse and imbalanced data simultaneously in the context of CTR prediction. Extensive experiments conducted on four real-world sparse datasets confirm the effectiveness and applicability of the SS4CTR model in scenarios characterized by sparse and imbalanced data.
What problem does this paper attempt to address?