Follow the LIBRA: Guiding Fair Policy for Unified Impression Allocation Via Adversarial Rewarding.

Xiaoyu Wang,Yonghui Guo,Bin Tan,Tao Yang,Dongbo Huang,Lan Xu,Hao Zhou,Xiang-Yang Li
DOI: https://doi.org/10.1145/3616855.3635756
2024-01-01
Abstract:The diverse advertiser demands (brand effects or immediate outcomes) lead to distinct selling (pre-agreed volumes with an under-delivery penalty or compete per auction) and pricing (fixed prices or varying bids) patterns in Guaranteed delivery (GD) and real-time bidding (RTB) advertising. This necessitates fair impression allocation to unify the two markets for promoting ad content diversity and overall revenue. Existing approaches often deprive RTB ads of equal exposure opportunities by prioritizing GD ads, and coarse-grained methods are inferior to 1) Ambiguous reward due to varied objectives and constraints of GD fulfillment and RTB utility, hindering measurement of each allocation's contribution to the global interests; 2) Intensified competition by the coexistence of GD and RTB ads, complicating their mutual relationships; 3) Policy degradation caused by evolving user traffic and bid landscape, requiring adaptivity to distribution shifts. We propose LIBRA, a generative-adversarial framework that unifies GD and RTB ads through request-level modeling. To guide the generative allocator, we solve convex optimization on historical data to derivehindsight optimal allocations that balance fairness and utility. We then train a discriminator to distinguish the generated actions from these solvedlatent expert policy's demonstrations, providing an integrated reward to align LIBRA with the optimal fair policy. LIBRA employs a self-attention encoder to capture the competitive relations among varying amounts of candidate ads per allocation. Further, it enhances the discriminator withinformation bottlenecks-based summarizer against overfitting to irrelevant distractors in the ad environment. LIBRA adopts a decoupled structure, where theoffline discriminator continuously fine-tunes with newly-coming allocations and periodically guides theonline allocation policy's updates to accommodate online dynamics. LIBRA has been deployed on the Tencent advertising system for over four months, with extensive experiments conducted. Online A/B tests demonstrate significant lifts in ad income (3.17%), overall click-through rate (1.56%), and cost-per-mille (3.20%), contributing a daily revenue increase of hundreds of thousands of RMB.
What problem does this paper attempt to address?