Abstract:Industrial recommendation systems (RS) rely on the multi-stage pipeline to balance effectiveness and efficiency when delivering items from a vast corpus to users. Existing RS benchmark datasets primarily focus on the exposure space, where novel RS algorithms are trained and evaluated. However, when these algorithms transition to real world industrial RS, they face a critical challenge of handling unexposed items which are a significantly larger space than the exposed one. This discrepancy profoundly impacts their practical performance. Additionally, these algorithms often overlook the intricate interplay between multiple RS stages, resulting in suboptimal overall system performance. To address this issue, we introduce RecFlow, an industrial full flow recommendation dataset designed to bridge the gap between offline RS benchmarks and the real online environment. Unlike existing datasets, RecFlow includes samples not only from the exposure space but also unexposed items filtered at each stage of the RS funnel. Our dataset comprises 38M interactions from 42K users across nearly 9M items with additional 1.9B stage samples collected from 9.3M online requests over 37 days and spanning 6 stages. Leveraging the RecFlow dataset, we conduct courageous exploration experiments, showcasing its potential in designing new algorithms to enhance effectiveness by incorporating stage-specific samples. Some of these algorithms have already been deployed online, consistently yielding significant gains. We propose RecFlow as the first comprehensive benchmark dataset for the RS community, supporting research on designing algorithms at any stage, study of selection bias, debiased algorithms, multi-stage consistency and optimality, multi-task recommendation, and user behavior modeling. The RecFlow dataset, along with the corresponding source code, is available at <a class="link-external link-https" href="https://github.com/RecFlow-ICLR/RecFlow" rel="external noopener nofollow">this https URL</a>.

RankFlow

RankFlow: Joint Optimization of Multi-Stage Cascade Ranking Systems as Flows

Both Efficiency and Effectiveness! A Large Scale Pre-ranking Framework in Search System

Ada-Ranker: A Data Distribution Adaptive Ranking Paradigm for Sequential Recommendation

Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

Residual Multi-Task Learner for Applied Ranking

Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems

RankTower: A Synergistic Framework for Enhancing Two-Tower Pre-Ranking Model

RecFlow: An Industrial Full Flow Recommendation Dataset

Cascade Ranking for Operational E-commerce Search

Ada-Ranker

CasFlow: Exploring Hierarchical Structures and Propagation Uncertainty for Cascade Prediction

A Knowledge-Fusion Ranking System with an Attention Network for Making Assignment Recommendations

FAA: Fine-grained Attention Alignment for Cascade Document Ranking

Unleashing the Potential of Multi-Channel Fusion in Retrieval for Personalized Recommendations

Mixed Information Flow for Cross-domain Sequential Recommendations

FairRank: Fairness-aware Single-tower Ranking Framework for News Recommendation

FLOW: A Feedback LOop FrameWork for Simultaneously Enhancing Recommendation and User Agents

Generative Flow Network for Listwise Recommendation

Cross-Stage Transfer in Multi-Stage Cascade Ranking and Filtering Systems

Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models