Abstract:We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated (s,S,p) policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and p, a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal (s,S,p) with a tight O(T^(1/2)) regret rate. A number of salient features differentiate our work from the existing online learning researches in the OM literature. First, computing the optimal (s,S,p) policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in operations management that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function p is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, due to the multi-period nature of (s,S,p) policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy p, which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric based argument is employed to prove convergence of the empirical distribution.

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of $(s,S)$ Inventory Policies

On the optimality equation for average cost Markov decision processes and its validity for inventory control

Analysis of batch ordering inventory models with setup cost and capacity constraint

Production-inventory Control Policy under Warm/cold State-Dependent Fixed Costs and Stochastic Demand: Partial Characterization and Heuristics

Optimality of (s, S, P) Policy in a General Inventory-Pricing Model with Uniform Demands.

Stochastic Setup-Cost Inventory Model With Backorders And Quasiconvex Cost Functions

Performance Bounds and Asymptotic Optimality of Modified (r, Q) Policies for Stochastic Distribution Inventory Systems

Structure of Optimal Solutions to Periodic-Review Total-Cost Inventory Control Models with Convex Costs and Backorders for all Values of Discount Factors

Ordering Policies for Periodic-Review Inventory Systems with Quantity-Dependent Fixed Costs

Continuity of Discounted Values and the Structure of Optimal Policies for Periodic-Review Inventory Control with Setup Costs

On Properties of Discrete (r, Q) and (s, T) Inventory Systems.

A Weak Convergence Approach to Inventory Control Using a Long-term Average Criterion

Optimal Ordering Policy for Inventory Systems with Quantity-Dependent Setup Costs

Optimal Policies for a Continuous Time MCP with Compact Action Set

Optimal Policy for a Production-Inventory System with Setup Cost and Average Cost Criterion

Asymptotic Optimality of Semi-Open-Loop Policies in Markov Decision Processes with Large Lead Times

Error Analysis of an Approximate Optimal Policy for an Inventory System with Stochastic and Continuous Demands.

A Receding Optimization Control Policy for Production Systems with Quadratic Inventory Costs

Average-Cost MDPs with Infinite State and Action Sets: New Sufficient Conditions for Optimality Inequalities and Equations

On the stochastic inventory problem under order capacity constraints

Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information