Learning an Inventory Control Policy with General Inventory Arrival Dynamics

Sohrab Andaz, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Randy Jia, Dean Foster, Sham Kakade
2023-10-26
Abstract:In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT). We also allow for order quantities to be modified as a post-processing step to meet vendor constraints such as order minimum and batch size constraints -- a common practice in real supply chains. To the best of our knowledge this is the first work to handle either arbitrary arrival dynamics or an arbitrary downstream post-processing of order quantities. Building upon recent work (Madeka et al., 2022) we similarly formulate the periodic review inventory control problem as an exogenous decision process, where most of the state is outside the control of the agent. Madeka et al., 2022 show how to construct a simulator that replays historic data to solve this class of problem. In our case, we incorporate a deep generative model for the arrivals process as part of the history replay. By formulating the problem as an exogenous decision process, we can apply results from Madeka et al., 2022 to obtain a reduction to supervised learning. Via simulation studies we show that this approach yields statistically significant improvements in profitability over production baselines. Using data from a real-world A/B test, we show that Gen-QOT generalizes well to off-policy data and that the resulting buying policy outperforms traditional inventory management systems in real world settings.
Machine Learning
What problem does this paper attempt to address?
The paper primarily discusses how to learn and backtest inventory control strategies in the case of a general stock-out and replenishment dynamic model (referred to as the QOT model). The paper also allows for post-processing of order quantities based on supplier constraints, such as minimum order quantity and batch size restrictions, which are common practices in actual supply chains. The current research is the first to address the post-processing problem of arbitrary downstream order quantities after dealing with arbitrary stock-out dynamics. The proposed method in the paper transforms the periodic inventory control problem into a supervised learning problem based on an external decision process, and utilizes deep generative models to simulate the stock-out dynamics in historical data. Through simulation studies, the paper demonstrates that this approach significantly improves profitability compared to the production baseline. Furthermore, the paper uses actual A/B test data to demonstrate that the proposed Gen-QOT model generalizes well to off-strategy data and has superior purchasing strategies compared to traditional inventory management systems in real-world settings. In summary, the paper aims to address the problem of effectively learning and implementing inventory control strategies in complex and uncertain stock-out scenarios, while considering various complexities in actual supply chains such as multiple shipments, unreliable supply, and order modifications. By introducing the new QOT model and deep learning techniques, the paper proposes an inventory management approach that improves profitability and adapts to real-world environments.