Optimal Analysis for Bandit Learning in Matching Markets with Serial Dictatorship

Zilong Wang,Shuai Li
DOI: https://doi.org/10.1016/j.tcs.2024.114703
IF: 1.002
2024-01-01
Theoretical Computer Science
Abstract:The problem of two-sided matching markets is well-studied in computer science and economics, owing to its diverse applications across numerous domains. Since market participants are usually uncertain about their preferences in various online matching platforms, an emerging line of research is dedicated to the online setting where one-side participants (players) learn their unknown preferences through multiple rounds of interactions with the other side (arms). Sankararaman et al. [23] provide an Ω(Nlog⁡(T)Δ2+Klog⁡(T)Δ) regret lower bound for this problem under serial dictatorship assumption, where N is the number of players, K(≥N) is the number of arms, Δ is the minimum reward gap across players and arms, and T is the time horizon. Serial dictatorship assumes arms have the same preferences, which is common in reality when one side participants have a unified evaluation standard. Recently, the work of Kong and Li [10] proposes the ET-GS algorithm and achieves an O(Klog⁡(T)Δ2) regret upper bound, which is the best upper bound attained so far. Nonetheless, a gap between the lower and upper bounds, ranging from N to K, persists. It remains unclear whether the lower bound or the upper bound needs to be improved. In this paper, we propose a multi-level successive selection algorithm that obtains an O(Nlog⁡(T)Δ2+Klog⁡(T)Δ) regret bound when the market satisfies serial dictatorship. To the best of our knowledge, we are the first to propose an algorithm that matches the lower bound in the problem of matching markets with bandits.
What problem does this paper attempt to address?