Abstract:We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated (s,S,p) policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and p, a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal (s,S,p) with a tight O(T^(1/2)) regret rate. A number of salient features differentiate our work from the existing online learning researches in the OM literature. First, computing the optimal (s,S,p) policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in operations management that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function p is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, due to the multi-period nature of (s,S,p) policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy p, which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric based argument is employed to prove convergence of the empirical distribution.

Solving a Joint Pricing and Inventory Control Problem for Perishables via Deep Reinforcement Learning

Solving Inventory Management Problems Through Deep Reinforcement Learning

Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment

Spatial-temporal Pricing for Ride-Sourcing Platform with Reinforcement Learning

Deep Inventory Management

Scalable multi-product inventory control with lead time constraints using reinforcement learning

Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information

A Reinforcement Learning Method for Inventory Control under State-based Stochastic Demand

Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand

Managing Perishable Inventory Systems with Age-differentiated Demand

Deep Reinforcement Learning Approach for Capacitated Supply Chain optimization under Demand Uncertainty

Distributed Dynamic Pricing Strategy Based on Deep Reinforcement Learning Approach in a Presale Mechanism

Deep Reinforcement Learning for Large-Scale Inventory Management

Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored Demands

Dynamic Ordering and Pricing for a Perishable Goods Supply Chain.

Multi-echelon inventory optimization using deep reinforcement learning

Deep reinforcement learning for demand fulfillment in online retail

Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management

Reinforcement Learning for Optimizing Can-Order Policy with the Rolling Horizon Method

A deep Q-learning approach to optimize ordering and dynamic pricing decisions in the presence of strategic customers

Can Deep Reinforcement Learning Improve Inventory Management? Performance on Dual Sourcing, Lost Sales and Multi-Echelon Problems