Autor3: Au tomated R eal-time R anking with R einforcement Learning in E-commerce Sponsored Search Advertising

Yusi Zhang,Zhi Yang,Liang Wang,Li He
DOI: https://doi.org/10.1145/3357384.3357808
2019-01-01
Abstract:Sponsored search platforms rank the advertisements (ads) by a ranking function to determine the impression allocation and the charging price for the advertisers. To place ads optimally, it is highly desirable but remain challenging to adapt ranking function to ad traffic at both large-scale and fine granularity. In this paper, we propose an automatic adaptive auction system called Autor 3. Our system leverages the variability and correlation of ad traffic in a search session and models ranking ads in a session as a multi-step decision-making problem. With effective yet lightweight abstractions of auction states and ranking actions, Autor3 builds a reinforcement learning (RL) framework to learn the ranking decision at the fine granularity of page views (i.e., impressions) over the large-scale auction volume. Our offline experiments show that our method considering sequential decision are superior to those that do not. We deployed Autor3 to process the billion-scale impressions per day in Taobao, the largest e-commerce platform in China. Using online A/B test and a subsequent full-scale deployment, we show that both the Revenue-Per-Mille (RPM) and Click-Through-Rates (CTRs) are improved comparing to the previous keyword-level approach used in Taobao's live production environment.
What problem does this paper attempt to address?