Abstract:Online learning in a decentralized two-sided matching markets, where the demand-side (players) compete to match with the supply-side (arms), has received substantial interest because it abstracts out the complex interactions in matching platforms (e.g. UpWork, TaskRabbit). However, past works assume that each arm knows their preference ranking over the players (one-sided learning), and each player aim to learn the preference over arms through successive interactions. Moreover, several (impractical) assumptions on the problem are usually made for theoretical tractability such as broadcast player-arm match Liu et al. (2020; 2021); Kong & Li (2023) or serial dictatorship Sankararaman et al. (2021); Basu et al. (2021); Ghosh et al. (2022). In this paper, we study a decentralized two-sided matching market, where we do not assume that the preference ranking over players are known to the arms apriori. Furthermore, we do not have any structural assumptions on the problem. We propose a multi-phase explore-then-commit type algorithm namely epoch-based CA-ETC (collision avoidance explore then commit) (\texttt{CA-ETC} in short) for this problem that does not require any communication across agents (players and arms) and hence decentralized. We show that for the initial epoch length of $T_{\circ}$ and subsequent epoch-lengths of $2^{l/\gamma} T_{\circ}$ (for the $l-$th epoch with $\gamma \in (0,1)$ as an input parameter to the algorithm), \texttt{CA-ETC} yields a player optimal expected regret of $\mathcal{O}\left(T_{\circ} (\frac{K \log T}{T_{\circ} \Delta^2})^{1/\gamma} + T_{\circ} (\frac{T}{T_{\circ}})^\gamma\right)$ for the $i$-th player, where $T$ is the learning horizon, $K$ is the number of arms and $\Delta$ is an appropriately defined problem gap. Furthermore, we propose a blackboard communication based baseline achieving logarithmic regret in $T$.

Improved Bandits in Many-to-one Matching Markets with Incentive Compatibility

Player-optimal Stable Regret for Bandit Learning in Matching Markets

Decentralized Competing Bandits in Many-to-One Matching Markets.

Dynamic Matching Bandit For Two-Sided Online Markets

Bandit Learning in Many-to-One Matching Markets

Optimal Analysis for Bandit Learning in Matching Markets with Serial Dictatorship

Bandit Learning in Decentralized Matching Markets

Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets

Competing Bandits in Non-Stationary Matching Markets

Incentive-Aware Recommender Systems in Two-Sided Markets

Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints

Incentives in Two-sided Matching Markets with Prediction-enhanced Preference-formation

Competing Bandits in Decentralized Large Contextual Matching Markets

Learning Optimal Stable Matches in Decentralized Markets with Unknown Preferences

Bandit Learning in Matching Markets: Utilitarian and Rawlsian Perspectives

Thompson Sampling for Bandit Learning in Matching Markets

Learning in Multi-Stage Decentralized Matching Markets

Online Matching with Stochastic Rewards: Advanced Analyses Using Configuration Linear Programs

Adaptive Regret for Bandits Made Possible: Two Queries Suffice

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

Putting Gale & Shapley to Work: Guaranteeing Stability Through Learning