Abstract:Recently, the online matching problem has attracted much attention due to its wide application on real-world decision-making scenarios. In stationary environments, by adopting the stochastic user arrival model, existing methods are proposed to learn dual optimal prices and are shown to achieve a fast regret bound. However, the stochastic model is no longer a proper assumption when the environment is changing, leading to an optimistic method that may suffer poor performance. In this paper, we study the online matching problem in dynamic environments in which the dual optimal prices are allowed to vary over time. We bound the dynamic regret of online matching problem by the sum of two quantities, including a regret of online max-min problem and a dynamic regret of online convex optimization (OCO) problem. Then we propose a novel online approach named Primal-Dual Online Algorithm (PDOA) to minimize both quantities. In particular, PDOA adopts the primal-dual framework by optimizing dual prices with the online gradient descent (OGD) algorithm to eliminate the online max-min problem's regret. Moreover, it maintains a set of OGD experts and combines them via an expert-tracking algorithm, which gives a sublinear dynamic regret bound for the OCO problem. We show that PDOA achieves an O(K sqrt{T(1+P_T)}) dynamic regret where K is the number of resources, T is the number of iterations and P_T is the path-length of any potential dual price sequence that reflects the dynamic environment. Finally, experiments on real applications exhibit the superiority of our approach.

Algorithm of matching law based on optimal policy search model

A dynamical policy search model for matching law.

A Stochastic Policy Search Model for Matching Behavior

Adaptive algorithm for multi-agent learning optimal cooperative pursuit strategy based on Markov game

Model-Based Robot Learning Control with Uncertainty Directed Exploration

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Finding Optimal Observation-Based Policies for Constrained POMDPs under the Expected Average Reward Criterion

Policy Optimization with Model-based Explorations

Matching-Based Policy Learning

Finding Optimal Memoryless Policies of POMDPs under the Expected Average Reward Criterion

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Optimal bipartite graph matching-based goal selection for policy-based hindsight learning

Towards Efficient Exact Optimization of Language Model Alignment

A two-sided matching decision-making approach based on prospect theory under the probabilistic linguistic environment

Matching provides efficient decisions

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

Decision Making in Non-Stationary Environments with Policy-Augmented Search

A Primal-Dual Online Algorithm for Online Matching Problem in Dynamic Environments

Adaptive Online Packing-guided Search for POMDPs

A Probability-Based Value Iteration on Optimal Policy Algorithm for POMDP

Proximal policy optimization via enhanced exploration efficiency