Abstract:The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though \citet{basu21beyond} successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players' preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the optimal stable regret of each player can be upper bounded by $O(K\log T/\Delta^2)$ where $K$ is the number of arms, $T$ is the horizon and $\Delta$ is the players' minimum preference gap among the first $N+1$-ranked arms. This result significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound.

Dynamic Matching Bandit For Two-Sided Online Markets

Competing Bandits in Non-Stationary Matching Markets

Decentralized Competing Bandits in Many-to-One Matching Markets.

Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints

Bandit Learning in Many-to-One Matching Markets

Improved Bandits in Many-to-one Matching Markets with Incentive Compatibility

Competing Bandits in Decentralized Large Contextual Matching Markets

Bandit Learning in Matching Markets: Utilitarian and Rawlsian Perspectives

Bandit Learning in Decentralized Matching Markets

A Primal-Dual Online Algorithm for Online Matching Problem in Dynamic Environments

Optimal Analysis for Bandit Learning in Matching Markets with Serial Dictatorship

Online Learning Bipartite Matching with Non-stationary Distributions

Online Matching with Stochastic Rewards: Advanced Analyses Using Configuration Linear Programs

Player-optimal Stable Regret for Bandit Learning in Matching Markets

Adaptive Dynamic Bipartite Graph Matching: A Reinforcement Learning Approach

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

Online Matching Frameworks under Stochastic Rewards, Product Ranking, and Unknown Patience

Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets

Exploring the Tradeoff between Competitive Ratio and Variance in Online-Matching Markets

Learning Optimal Stable Matches in Decentralized Markets with Unknown Preferences

A Unified Model for Bi-objective Online Stochastic Bipartite Matching with Two-sided Limited Patience