STEC: See-Through Transformer-based Encoder for CTR Prediction

Serdarcan Dilbaz,Hasan Saribas
2024-05-21
Abstract:Click-Through Rate (CTR) prediction holds a pivotal place in online advertising and recommender systems since CTR prediction performance directly influences the overall satisfaction of the users and the revenue generated by companies. Even so, CTR prediction is still an active area of research since it involves accurately modelling the preferences of users based on sparse and high-dimensional features where the higher-order interactions of multiple features can lead to different outcomes.
Information Retrieval
What problem does this paper attempt to address?
The paper focuses on the click-through rate (CTR) prediction problem, which is a critical task in online advertising and recommendation systems as it directly affects user satisfaction and company revenue. Despite many models attempting to accurately model user preferences based on sparse and high-dimensional features, CTR prediction remains an active research area, as the high-order interactions of multiple features may lead to different results. Most CTR prediction models rely on a single fusion and interaction learning strategy, while a few models that use multiple interaction modeling strategies treat each interaction as independent. The paper proposes a new model called STEC (S EE-THROUGH TRANSFORMER-BASED ENCODER), which combines multiple interaction learning methods in a unified architecture and introduces residual connections at different interaction levels, allowing low-order interactions to directly impact predictions, thereby improving performance. Through extensive experiments on four real-world datasets, STEC demonstrates better expressive power than existing state-of-the-art methods, resulting in superior performance in CTR prediction. The core of STEC is the STEC block, which modifies the dot-product attention formula to simultaneously extract bilinear interactions. Additionally, STEC is able to parallelize multiple attention layers and bilinear interactions to learn different interaction subspaces at different positions. The STEC architecture is similar to Transformer, with interleaved stacking of multiple layers of STEC blocks and position-aware fully connected neural networks (FFN) to perform CTR prediction. The paper conducts quantitative evaluations, including offline and online evaluations on public datasets and in industrial environments, and the results show that STEC performs as well as or better than existing attention-based models on multiple datasets, while having a lighter parameter footprint. Furthermore, the interpretability of STEC allows for more insightful learning of interactions within the model.