Abstract:In sponsored search engines, pre-trained language models have shown promising performance improvements on Click-Through-Rate (CTR) prediction. A widely used approach for utilizing pretrained language models in CTR prediction consists of fine-tuning the language models with click labels and early stopping on peak value of the obtained Area Under the ROC Curve (AUC). Thereafter the output of these fine-tuned models, i.e., the final score or intermediate embedding generated by language model, is used as a new Natural Language Processing (NLP) feature into CTR prediction baseline. This cascade approach avoids complicating the CTR prediction baseline, while keeping flexibility and agility. However, we show in this work that calibrating separately the language model based on the peak single model AUC does not always yield NLP features that give the best performance in CTR prediction model ultimately. Our analysis reveals that the misalignment is due to overlap and redundancy between the new NLP features and the existing features in CTR prediction baseline. In other words, the NLP features can improve CTR prediction better if such overlap can be reduced. For this purpose, we introduce a simple and general joint-training framework for fine-tuning of language models, combined with the already existing features in CTR prediction baseline, to extract supplementary knowledge for NLP feature. Moreover, we develop an efficient Supplementary Knowledge Distillation (SuKD) that transfers the supplementary knowledge learned by a heavy language model to a light and serviceable model. Comprehensive experiments on both public data and commercial data presented in this work demonstrate that the new NLP features resulting from the joint-training framework can outperform significantly the ones from the independent fine-tuning based on click labels. we also show that the light model distilled with SuKD can provide obvious AUC improvement in CTR prediction over the traditional feature-based knowledge distillation.

Multi-Grained Topological Pre-Training of Language Models in Sponsored Search

AdsGNN: Behavior-Graph Augmented Relevance Modeling in Sponsored Search

Improving Relevance Modeling Via Heterogeneous Behavior Graph Learning in Bing Ads

Leveraging Bidding Graphs for Advertiser-Aware Relevance Modeling in Sponsored Search

SPM: Structured Pretraining and Matching Architectures for Relevance Modeling in Meituan Search

CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search

AdsCVLR: Commercial Visual-Linguistic Representation Modeling in Sponsored Search

A language model approach to capture commercial intent and information relevance for sponsored search.

Language Models-enhanced Semantic Topology Representation Learning for Temporal Knowledge Graph Extrapolation

Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language

Pretrained Language Model based Web Search Ranking: From Relevance to Satisfaction

Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study

Improving Topic Relevance Model by Mix-structured Summarization and LLM-based Data Augmentation

Pretraining Language Models with Text-Attributed Heterogeneous Graphs

Personalized Attraction Enhanced Sponsored Search with Multi-task Learning

Understanding and Modeling Job Marketplace with Pretrained Language Models

Multi Page Search with Reinforcement Learning to Rank

Multi-task Pre-training Language Model for Semantic Network Completion

Learning Supplementary NLP Features for CTR Prediction in Sponsored Search

LEAD-ID: Language-Enhanced Denoising and Intent Distinguishing Graph Neural Network for Sponsored Search Broad Retrievals

QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search