Abstract:As the heart of a search engine, the ranking system plays a crucial role in satisfying users' information demands. More recently, neural rankers fine-tuned from pre-trained language models (PLMs) establish state-of-the-art ranking effectiveness. However, it is nontrivial to directly apply these PLM-based rankers to the large-scale web search system due to the following challenging issues: (1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web document, prohibit their deployments in an online ranking system that demands extremely low latency; (2) the discrepancy between existing ranking-agnostic pre-training objectives and the ad-hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online ranking system; (3) a real-world search engine typically involves a committee of ranking components, and thus the compatibility of the individually fine-tuned ranking model is critical for a cooperative ranking system. In this work, we contribute a series of successfully applied techniques in tackling these exposed issues when deploying the state-of-the-art Chinese pre-trained language model, i.e., ERNIE, in the online search engine system. We first articulate a novel practice to cost-efficiently summarize the web document and contextualize the resultant summary content with the query using a cheap yet powerful Pyramid-ERNIE architecture. Then we endow an innovative paradigm to finely exploit the large-scale noisy and biased post-click behavioral data for relevance-oriented pre-training. We also propose a human-anchored fine-tuning strategy tailored for the online ranking system, aiming to stabilize the ranking signals across various online components. Extensive offline and online experimental results show that the proposed techniques significantly boost the search engine's performance.

Axiomatically Regularized Pre-training for Ad Hoc Search

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

Pre-training Methods in Information Retrieval

ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback

Unleashing Potential of Unsupervised Pre-Training with Intra-Identity Regularization for Person Re-Identification

B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval

AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking

AutoADR: Automatic Model Design for Ad Relevance

Optimizing Dense Retrieval Model Training with Hard Negatives.

Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors

Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models

CLIPRerank: An Extremely Simple Method for Improving Ad-hoc Video Search

Divide and Conquer: Hybrid Pre-training for Person Search

Pre-trained Language Model based Ranking in Baidu Search

ArT: All-round Thinker for Unsupervised Commonsense Question-Answering

APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

Improving Retrieval Augmented Language Model with Self-Reasoning

A Simple yet Effective Framework for Active Learning to Rank

Pre-training for Information Retrieval: Are Hyperlinks Fully Explored?

A Self-supervised Joint Training Framework for Document Reranking.