Abstract:– Devising a dynamic pricing policy with always valid online statistical learning procedures is an important and as yet unresolved problem. Most existing dynamic pricing policies, which focus on the faithfulness of adopted customer choice models, exhibit a limited capability for adapting to the online uncertainty of learned statistical models during the pricing process. In this article, we propose a novel approach for designing a dynamic pricing policy based on regularized online statistical learning with theoretical guarantees. The new approach overcomes the challenge of continuous monitoring of the online Lasso procedure and possesses several appealing properties. In particular, we make the decisive observation that the always-validity of pricing decisions builds and thrives on the online regularization scheme. Our proposed online regularization scheme equips the proposed optimistic online regularized maximum likelihood pricing ( OORMLP ) pricing policy with three major advantages: encode market noise knowledge into pricing process optimism; empower online statistical learning with always-validity overall decision points; envelope prediction error process with time-uniform non-asymptotic oracle inequalities. This type of non-asymptotic inference results allows us to design more sample-efficient and robust dynamic pricing algorithms in practice. In theory, the proposed OORMLP algorithm exploits the sparsity structure of high-dimensional models and secures a logarithmic regret in a decision horizon. These theoretical advances are made possible by proposing an optimistic online Lasso procedure that resolves dynamic pricing problems at the process level, based on a novel use of non-asymptotic martingale concentration. In experiments, we evaluate OORMLP in different synthetic and real pricing problem settings and demonstrate that OORMLP advances the state-of-the-art methods. Supplementary materials for this article are available online.

$ε$-Policy Gradient for Online Pricing

Online Learning and Pricing for Multiple Products with Reference Price Effects

Phase Transitions in Learning and Earning under Price Protection Guarantee

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

Stochastic Cubic-Regularized Policy Gradient Method

Online Regularization toward Always-Valid High-Dimensional Dynamic Pricing

Policy gradient learning methods for stochastic control with exit time and applications to share repurchase pricing

Online Policy Optimization in Unknown Nonlinear Systems

Online Policy Learning and Inference by Matrix Completion

Policy Mirror Descent Inherently Explores Action Space

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

A nearly Blackwell-optimal policy gradient method

Earning and Learning with Varying Cost

OptiGrad: A Fair and more Efficient Price Elasticity Optimization via a Gradient Based Learning

Dynamic Pricing and Learning with Long-term Reference Effects

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing

Sublinear Regret with Barzilai-Borwein Step Sizes

Online Pricing with Offline Data: Phase Transition and Inverse Square Law

No-Regret Learning in Dynamic Competition with Reference Effects Under Logit Demand

Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift