Multi-Scenario Combination Based on Multi-Agent Reinforcement Learning to Optimize the Advertising Recommendation System

Yang Zhao,Chang Zhou,Jin Cao,Yi Zhao,Shaobo Liu,Chiyu Cheng,Xingchen Li
2024-07-03
Abstract:This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a shared objective and allows for strategy communication to boost overall performance. Our results show marked improvements in metrics such as click-through rate (CTR), conversion rate, and total sales, confirming our method's efficacy in practical settings.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the optimization problem across multiple scenarios (such as search, recommendation, and advertising) on e-commerce platforms. Current optimization techniques typically handle these scenarios in isolation, leading to a disjointed user experience and suboptimal overall performance. When users frequently switch between different scenarios, independent optimization can result in conflicts and inefficiencies. To solve these issues, the authors propose a method based on Multi-Agent Reinforcement Learning (MARL), introducing the Multi-Agent Recurrent Deterministic Policy Gradient (MA-RDPG) algorithm to achieve collaborative optimization across multiple scenarios. This approach not only considers the objectives within individual scenarios but also focuses on the overall performance of the entire platform, thereby significantly improving key performance indicators (such as Click-Through Rate (CTR), conversion rate, and total sales). Experimental results show that compared to traditional Learning to Rank (L2R) methods, the MA-RDPG algorithm can better facilitate cooperation between different scenarios and enhance overall revenue.