Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Learning without Communication

Xu He,Bo An,Yanghua Li,Haikai Chen,Rundong Wang,Xinrun Wang,Runsheng Yu,Xin Li,Zhirong Wang
DOI: https://doi.org/10.48550/arXiv.2008.09369
2020-08-29
Abstract:With the rise of online e-commerce platforms, more and more customers prefer to shop online. To sell more products, online platforms introduce various modules to recommend items with different properties such as huge discounts. A web page often consists of different independent modules. The ranking policies of these modules are decided by different teams and optimized individually without cooperation, which might result in competition between modules. Thus, the global policy of the whole page could be sub-optimal. In this paper, we propose a novel multi-agent cooperative reinforcement learning approach with the restriction that different modules cannot communicate. Our contributions are three-fold. Firstly, inspired by a solution concept in game theory named correlated equilibrium, we design a signal network to promote cooperation of all modules by generating signals (vectors) for different modules. Secondly, an entropy-regularized version of the signal network is proposed to coordinate agents' exploration of the optimal global policy. Furthermore, experiments based on real-world e-commerce data demonstrate that our algorithm obtains superior performance over baselines.
Machine Learning,Artificial Intelligence,Multiagent Systems
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in a multi - module recommendation system, the ranking strategies among various modules are independently optimized by different teams, resulting in a lack of cooperation among modules, which may lead to competition and make the global strategy of the entire page sub - optimal. Specifically, when a web page contains multiple recommendation modules, each module displays products with different attributes, such as products with large discounts. Since these modules are in charge of different teams and are individually optimized without communicating with each other, this may cause the same product or category to appear repeatedly in multiple modules, wasting the limited space on the page and affecting the user experience. Therefore, the goal of this paper is to design an appropriate collaboration mechanism to achieve the global optimal strategy of the multi - module recommendation system, that is, to promote cooperation among modules without communication through the multi - agent reinforcement learning method.