Abstract:Problem Definition: This paper considers a setting in which an airline company sells seats periodically, and each period consists of two selling phases, an early-bird discount phase and a regular-price phase. In each period, when the early-bird discount seat is stocked out, an early-bird customer who comes for the discounted seat either purchases the regular-price seat as a substitute (called buy-up substitution) or simply leaves. Methodology/Results: The optimal inventory level of the discounted seats reserved for the early-bird sale is a critical decision for the airline company to maximize its revenue. The airline company learns about the demands for both discounted and regular-price seats and the buy-up substitution probability from historical sales data, which, in turn, are affected by past inventory allocation decisions. In this paper, we investigate two information scenarios based on whether lost sales are observable, and we provide the corresponding Bayesian updating mechanism for learning about demand parameters and substitution probability. We then construct a dynamic programming model to derive the Bayesian optimal inventory level decisions in a multiperiod setting. The literature finds that the unobservability of lost sales drives the inventory manager to stock more (i.e., the Bayesian optimal inventory level should be kept higher than the myopic inventory level) to observe and learn more about demand distributions. Here, we show that when the buy-up substitution probability is known, one may stock less, because one can infer some information about the primary demand for the discounted seat from the customer substitution behavior. We also find that to learn about the unknown buy-up substitution probability drives the inventory manager to stock less so as to induce more substitution trials. Finally, we develop a SoftMax algorithm to solve our dynamic programming problem. We show that the obtained stock more (less) result can be utilized to speed up the convergence of the algorithm to the optimal solution. Managerial Implications: Our results shed light on the airline seat protection level decision with learning about demand parameters and buy-up substitution probability. Compared with myopic optimization, Bayesian inventory decisions that consider the exploration-exploitation tradeoff can avoid getting stuck in local optima and improve the revenue. We also identify new driving forces behind the stock more (less) result that complement the Bayesian inventory management literature. Funding: Z. Luo acknowledges the financial support by the Internal Start-up Fund of The Hong Kong Polytechnic University [Grant P0039035]. P. Guo acknowledges the financial support from the Research Grants Council of Hong Kong [Grant 15508518]. Y. Wang’s work was supported by the Research Grants Council of Hong Kong [Grant 15505318] and the National Natural Science Foundation of China [Grant 71971184]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/msom.2022.1169 .

Learning an Inventory Control Policy with General Inventory Arrival Dynamics

Production-inventory Control Policy under Warm/cold State-Dependent Fixed Costs and Stochastic Demand: Partial Characterization and Heuristics

Deep Inventory Management

Performance Bounds and Asymptotic Optimality of Modified (r, Q) Policies for Stochastic Distribution Inventory Systems

Dynamic Inventory Control with Stockout Substitution and Demand Learning

A Minibatch Stochastic Gradient Descent-Based Learning Metapolicy for Inventory Systems with Myopic Optimal Policy

Online Policy Selection for Inventory Problems

A Deep Reinforcement Learning Approach for Inventory Control under Stochastic Lead Time and Demand

A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

Zero-shot Generalization in Inventory Management: Train, then Estimate and Decide

Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies

VC Theory for Inventory Policies

Dynamic Stochastic Inventory Management in E-Grocery Retailing

Dynamic inventory replenishment strategy for aerospace manufacturing supply chain: combining reinforcement learning and multi-agent simulation

Adaptive Inventory Control for Nonstationary Demand and Partial Information

Partial Backorder Inventory System: Asymptotic Optimality and Demand Learning

Manage Inventories with Learning on Demands and Buy-up Substitution Probability

Learning General Inventory Management Policy for Large Supply Chain Network

Contextual Bandits for Evaluating and Improving Inventory Control Policies

A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information